Monday, March 13, 2017

My Internship at WSO2


1      Orientation Program.. 9
1.1       Introduction. 9
1.2       First Timer chase. 9
1.3       Product Introductions. 10
1.4       Development environment 10
1.5       Project Introductions and Allocation. 10
2      Internship Project 11
2.1       Improving WSO2 Support Knowledge Base Version 1.0. 11
2.2       Releasing WSO2 Support Knowledge Base Version 2.0. 14
2.3       Creating a conversational chat bot 16
2.4       Documentation Knowledge Base. 19
2.5       Releasing WSO2 Support Knowledge Base Version 3.0. 25
2.6       Introductory Sessions. 31
3      Other Activities. 33
3.1       Training Sessions. 33
3.2       Recreational Activities. 33
3.3       Inter-house badminton tournament 33
3.4       “Smart@ss” Quiz. 34
3.5       Inter-floor basketball tournament 35
3.6       Tea time Championship. 35
4      Conclusion
ACKNOWLEDGEMENTS
REFERENCES

1        Orientation Program

1.1        Introduction

We joined WSO2 as interns on 25th of July 2016. In first 5 days there was an orientation program to be familiar with WSO2 environment and several info sessions were conducted to make us aware of various aspects on the company. On the very first day of the internship period, we stepped into the main office of WSO2 in 20, Palm Grove, Colombo 03. We were warmly welcome by Mr. Charitha Bandara, Senior HR Lead with Ms. Pramila Rajapaksha, Director of HR & Admin. One week orientation program was conducted by Mr. Charitha Bandara and it was a friendly discussion which resolves our doubts and introduced WSO2 culture.
He explained how flat hierarchy is maintained within the company and how each member is treated equally. They emphasized that we should talk to each other with their first names, even for the CEO of the company. In addition to that they briefly explained us about the history of WSO2, the business model used in WSO2. In the company there was no dress code we should wear, and we are free to wear casual outfits.
Mr. Charitha Bandara further explained that office hours are flexible and there will be no deadlines for our work though we have to be responsible for ourselves. Permanent employees working in WSO2 has the option of working from home, but interns will not get that chance. From those instructions given in induction program we realized that WSO2 is a pleasant place to work.
After the session concluded, we were asked to collect the corporate laptops from Mr. Akila Basnayaka and we were divided into the four houses in WSO2 (‘Cloud Bots’ ,’Legions’, ‘Wild Boars’ and ‘Titans’) and I was allocated to the house ‘Titans’.

1.2        First Timer chase

In third day we had to participate quite interesting event called ‘First timer chase’. Intention of this event was to make the interns familiar with WSO2 premises, meet the employees and interact with them. That was quite fun and we were divided into the 4 houses of WSO2 and further divided into groups of 3 people. Each team had to complete the list of task given in a paper within a limited time duration. Since it was a competition, all of us tried to complete them as quickly as possible.

1.3        Product Introductions

During the 4th and 5th days of the orientation program a few technical sessions were held to introduce some of the main products of WSO2 namely WSO2 Carbon, WSO2 Application Server, WSO2 API Manager, WSO2 Enterprise Service Bus, WSO2 Identity Server, WSO2 Enterprise Mobility Manager & IoT Server and WSO2 Data Analytics Server. These sessions were conducted by mostly by the product team lead or a team member.  

1.4        Development environment

After getting the laptops we were asked to set up our laptops with the development environment and set up the MAC address to be able to access the organization’s WIFI network.
Ubuntu - We were instructed to install Ubuntu OS since that OS provide more flexibility to developers and has thousands of free and open source development packages out there for Ubuntu. Since I had the previous experience in working with Ubuntu it was easier for me to get used to it.
Java - Since almost all of the WSO2 products are java applications, the default programming language used in WSO2 is none other than JAVA.
Eclipse and IntelliJ - Eclipse and the IntelliJ are 2 main java IDEs which are mostly used by the developers in WSO2. 

1.5        Project Introductions and Allocation

On the first day of the second week introduction sessions were held to all the available internship projects by the mentors. Mr. Selvaratnam Uthaiyashankar asked us to send our top 5 project preferences by the end of the day. After I sent the preferences I was assigned to the project “Intelligent Support Bot on top of NLP based KB” which was mentored by Dr. Srinath Perera (Vice President - Research), Mr. Samisa Abeysinghe (Vice President - Delivery) and Mr. Nirmal Fernando (Associate Technical Lead). We were also assigned to a direct mentor for the training period and I was assigned to Mr. Samisa Abeysinghe (Vice President - Delivery).

2        Internship Project

The next day I and Madhawa (Madhawa Vidanapathirana worked in the same project as me during the training period) met Mr. Samisa Abeysinghe and he explained us the objectives of the project. He asked us to read the documentation created by the previous two interns who worked on the project “WSO2 Support Knowledge Base”. I went through the documentation and got an idea about the current state of the system and I created a milestone plan and a weekly plan for the project.

Figure 1 Milestone Plan - Gantt chart
We identified the main 2 phases of the project as improving the current system and developing an intelligent bot on top of the improved knowledge base.

2.1        Improving WSO2 Support Knowledge Base Version 1.0

After discussing with our mentor we identified 2 main areas to improve in the existing version of the Support KB system.
1.      Improving the summarization of the results
2.      Improving the solution classification

2.1.1       Improving the summarization

We implemented a sentence tagging scheme to tag sentences to improve summarization. I undertook the following tasks.
1.      Implementing Tagger to introduce / remove tags
2.      Implementing an algorithm to untag tagged sentences with escaping tags in original text
3.      Implementing TagFilter to filter a given type of tagged sentence
Sentence Tagging Scheme
Sentences in Issue Description and Issue Comments of an issue are tagged as Plain Text, Question, Answer, Ignorable, Code or Log.
This tagging scheme is based on the classification obtained using the Stanford Classifier [2] trained separately for QAI (question, answer and ignorables) and CLN (code, log and normal text) tagging schemes.
Tagger class was implemented to introduce / remove tags. TagFilter was implemented to filter a given type of tagged sentence / get filtered, untagged text. Following tags are introduced using this scheme.
Table 1 Tagging Scheme used by Tagger
Tag
Description
\qs
Question Sentence start
\qe
Question Sentence end
\as
Answer Sentence start
\ae
Answer Sentence end
\is
Ignorable Sentence start
\ie
Ignorable Sentence end
\cs
Code Sentence start
\ce
Code Sentence end
\ls
Log Sentence start
\le
Log Sentence end

Stanford Column Classifier
The Stanford Column Classifier is a Machine Learner based Classifier developed by Stanford University and distributed freely under GNU GPL Version 2.0 License for non-commercial use. Given records of Single-Column/Multi-Column Label Tagged Data as training set, the Stanford Classifier is capable of accurately guessing the label of an arbitrary record with same number of columns. The column data-types should match and could be either textual or numeric. The classifier should be pre-trained using a training set and if the training set is to be changed, classifier should be re-trained. However, the training process is very fast. [2]
A separate JAVA application was developed to expose Stanford Classifier Features to the Support KB Python Application. The two applications were interfaced using Py4J Framework. [3]
The Stanford Column Classifier was extensively used in the Support Knowledge Base System.
fastText by Facebook
The fastText is a library for efficient learning of word representations and sentence classification made by Facebook Research. It provides a Text Classification system in C which is similar in accuracy to Stanford Column Classifier. Additionally, fastText claims to be faster than Stanford Classifier for Learning. [4]
I trained and tested the FB fastText classifer using the same training set which we used for the Stanford classifier and it gave the same results which is around 70% accuracy.
However, since we had already started developments using Stanford Classifier and it provided sufficient performance, the switch to fastText was not made.

2.1.2       Improving the solution classification

We started the next phase of improving the Support KB; Improving the Solution Classifier. We made an initial design for the Keyword Weighting scheme for the Solution Classifier. The scheme gives weight to the keywords according to the place (title, questions in the description, rest of the description and the answer comments) they matched and the number of times they occurred. We decided to store the weights in two places.
1.      Summaries collection – each issue will have a keyword list with weights
2.      Keywords collection – the mapped issues will have the total weight they obtain from the mapped keywords
The initial weighting scheme did not show expected results. We changed the weighting scheme several times. The changes will be documented. We are still in the process of finding a good scheme.
I wrote the AnswerCommentsKeywordExtractor class to extract keywords from the answer comments of issues which calls the old java text classifier application twice for each issue which made the KB generation very slow which became a problem. 
I researched on NLTK wordnet lemmatizer to use instead of the porter stemmer to get the root words when extracting keywords but seems it will not make much improvement. With the wordnet lemmatizer it is not possible to find a common root word matched for all related words. Eg: lemmatizer gives 'generalize', for 'generalized' and 'generalizing' when part of speech (POS) tag parameter is used but not for 'generalization'. I posted a question on StackOverflow [5] regarding this. I used lemmatization [6] for keywords along with the stemming. Since the wordnet lemmatizer does not recognize the treebank POS (Part of Speech) tags given by nltk, I had to use a POS tag converter.
I researched on using SYN sets to identify synonyms of the search keywords to improve the search accuracy. Then I decided to use synonyms of the keywords extracted from the search text and and include the synonyms in the keyword matching algorithm. I implemented the SynonymFinder and integrated it with the SolutionGenerator for synonym matching for search keywords. When the synonym matching is implemented, the system was taking a long time to generate the solution. The reason was the SynonymFinder is trying each combination of all synonyms of all keywords. Therefore I limited the number of synsets used to 4. Since the wordnet synsets are ordered in most common synonyms first it produced good results.
After improving the system we started making the system ready for production.

2.2        Releasing WSO2 Support Knowledge Base Version 2.0

We started to prepare the current system for deploying on production server. I started fixing the Admin-Full-Answer view to comply with the new summarizing scheme. While doing this I encountered a problem with the angular app. Angular application could not get data of the result (full answer of ticket) passed when the page is rendered by python app. As the solution I added a hidden field in the html and added the issue id. Then from the angular app I used a jQuery call to get the issue id and added a new route in python app to get jsonified data of the full answer.
After preparing the Support KB code for production deployment we tested the deployment on the staging server. For deploying on the server the Support KB need to run in a WSGI [7] application. To test this I created a test flask application [8] in a virtual environment and deployed on apache server using mod_WSGI. [9]
Then I created a WSGI application for the Support KB and configured the staging machine to run the Support KB WSGI application. Few problems occurred while making this work on the server. First the WSGI app was not working on the server. I tried to find out the configuration error or misconfiguration for some time but could not find a reason.
Then I reinstalled the apache on the server and configured from the beginning. Then it was working. Then the search of the Support KB was not working. After checking for the problem
Then I deployed the Support KB on the production instance using the same procedures as the staging machine. We setup the DNS with the existing domain name with the new IP address of the new production instance and finished deploying the “WSO2 Support Knowledge Base Version 2.0” and we asked the support-dev mail-group to try it out and give us feedback.

Figure 2 WSO2 Support Knowledge Base Version 2.0 release

2.3        Creating a conversational chat bot


2.3.1       Secondary Solution Generation and Classification

After the discussion with Mr. Samisa we started working on the Secondary solution classification/generation which will lead to the development of the BOT. The plan was to take top 5 solutions for a search from the existing Support KB and identify the questions asked in those tickets and create a mapping with the answers provided in the ticket comments. Then rank the identified questions according to the relevance to the initial search text and find the best answer/answers and compile a answer for the search problem.
I started implementing the Question Identifier which is the first part of the secondary solution generation. The Question Identifier takes the top 5 solutions from the Support KB search results and identify the question sentences of each ticket and returns a questions list. A problem we faced here is there are many types of tickets which are described in various ways and are answered in various ways.
There are many types of tickets which are described in various ways and are answered in various ways which makes it hard to identify questions and create the mapping to answers. We decided to identify several types of descriptions and implement Answer Matchers for each type.
We identified 6 types of tickets to which needed Answer Matchers for the implement question-answer mapping for the Secondary Solution Generation.
1.      Question list in description, QA List in comments
2.      Direct Question in descriptions
3.      Short Question with a scenario described
4.      Short scenario description without question
5.      Several short questions with scenarios
6.      Several short scenario descriptions without questions
I implemented the Answer Matcher for the types 3 and 4. For the type 1 we can directly extract mapped questions and answers (Madhawa implemented). For type 2, 3 and 4 the answer can be found in comments (Can assume a reasonable answer in first comment) and a proper answer should be compiled using all comments. But for the type 5 and 6 it is not possible to identify proper questions and answers since there is no standard.
We had a meeting the mentors and we discussed on the difficulty in extracting information from unstructured tickets and presenting them properly and the difficulty of generating answers. Conclusion was that we need a significant technological advancement to achieve it.
Natural Language Answer Generation which is needed to generate answer of BOT is a complex field which is still in development. We were stuck with the project with currently available resources of Natural Language Generation. Therefore we decided to research conversational BOT building platforms.

2.3.2       Conversational Bot

There are two types of chat bots 
  1. Functions based on a set of rules - responds only for specific commands.
  2. More advanced version uses machine learning
From the research I found that we can build bots on platforms like Facebook Messenger and Slack. We can use services like wit.ai and howdy’s botkit (open source) to build a bot. I started making a test chat bot on slack using Howdy’s Botkit. [10]
Slack bot was implemented using nodejs. When a question is asked from this bot it retrieves results from the support KB and answers using the solutions. Open source tool Howdy’s botkit was used for bot creation.
Then the bot was integrated to the slack platform. [11]

Figure 3 Support BOT integrated with Slack
First I implemented an endpoint in Support KB’s app.py to provide search results to the Bot. This endpoint returns the top 5 search results for the search text (asked question) in JSON form.
Then I implemented bot function to get the search results from the support KB and answer with comments of the top search result. The answer provided was not user friendly. Therefore I improved the presentation of the solution by giving the title and short description first and then answer only with the first comment.
Then I implemented interactive search narrow down using the prominent keywords of the top search results using the support kb endpoint response which contains a mapping of prominent keyword to issue IDs. After adding this the bot asks the user for a most relevant keyword providing the list provided by the endpoint and according to the user reply the bot will provide the narrowed down answer.
At this point the bot code was somewhat complicated with plenty of callbacks. I searched a way to avoid this situation. This kind of situation is called a ‘callback hell’. To avoid this we have to shallow the code, modularize and handle errors. Therefore I re-factored the code by shallowing it by adding functions and function calls.
‘Callback hell’ is a situation where callbacks are nested within other callbacks several levels deep, potentially making it difficult to understand and maintain the code when using asynchronous JavaScript, or JavaScript that uses callbacks. This problem has occurred in our bot code which is written in node JS. As a solution I shallowed the code by adding functions. We need to make more refactoring to avoid this by using PromisesJS [12] structure.
Therefore I started modifying the code to use Promises, which is a good way to handle asynchronous calls. After adding the promises to http get calls there was a problem when adding promises to ask() method of the bot. There were several ways to add but it was not working. Then I had to wrap the ask() methods in new functions with promises.
After adding promises to the existing bot, I started implementing the feedback system for the bot. The modification was to give the solution first, ask questions to get feedback on relevance and completeness of the answer and then provide more solutions based on the feedback.
Implementing the feedback system for the bot was not successful due to premature conversation ending in bot due to a race condition when asynchronous callback handling using promises. I found some temporary solutions in the web but it was not successful. I tried removing convo.next() methods until asynchronous call finishes, but it did not work.

2.4        Documentation Knowledge Base

After discussing with Mr. Samisa we planned to start implementing a new Knowledge Base based on WSO2 documentations. For this we first chose a single article of ESB documentation to extract HTML source and parse it into a tree structure to extract knowledge. I started implementing the WebPageTreeGenerator to generate hierarchical tree from the web article.
First I implemented filtering a HTML document from the tag id when parsing into the web page tree. Then I implemented fetching HTML content from a web page when the URL is given.
Up to this point we were working on a sample page of the documentation and extracting knowledge from that sample page. Then I looked into a way to download the full WSO2 documentation to extract knowledge for the Knowledge Base. I found a python framework named Scrapy [13] which can be used to implement web crawlers to extract knowledge from web pages. Using this framework I implemented a web crawler named ‘DocScraper’ which can go into the URLs of a given page recursively and extract the HTML content.
The main page of the WSO2 documentation [14], https://docs.wso2.com/ does not have a ‘wiki-content’ div which I used to extract HTML content. Therefore I could not use this as the starting URL for the DocScraper since it extract URLs from the ‘wiki-content’ div. Therefore I started extracting from the ESB documentation.
Then I integrated the DocScraper with the Database generator which Madhawa implemented. Then we generated the Documentation knowledge base by running the application in the staging machine.
I implemented an ArticleGenerator to generate the articles collection from the doc_pages collection of the database which was generated by scraping the WSO2 Documentations using the DocScraper. Then I modified it to generate keywords as well while generating the articles collection. There was an error due to the binary files in the doc_pages collection. Therefore I added a try catch block to handle non-HTML documents.
I added the app.py to the documentation_kb with a route to display article with information when the article Id is given. This is the basic bot for the documentation KB. I added the front end HTML to show the article information. Since we stored the HTML content of the articles I rendered the HTML to this view. Then I added access to the parent nodes of the article and the sub articles which are title type. I filtered other types of sub articles since their ‘content’ field is too long and the actual sub articles we want to navigate are the title type sub articles.
The ArticleGenerator which generates the Knowledge Base from the scraped HTML has missed some pages. I modified the Scraper to start from the WSO2 documentation root page instead starting from the ESB page. I added a separate parse method to the root page since it does not have a ‘wiki-content’ div.
I added the check in doc_pages collection for the doc_spider to make sure the page has not been generated earlier even if the application is stopped and re-run. Then I added a new collection to keep track of generated pages when generating articles to be able to run generation from the place where stopped.
The search in the documentation knowledge base goes out when the important keywords are not in the title structure. The reason is current search only takes the title hierarchy for keyword extraction. We have started implementing a secondary filter by getting top results and checking the TFIDF [15] value for the unmatched keywords in the content. I added a method to get unmatched keywords when the best matching articles are given and a method to get best matching article by checking TFIDF of the content.
I modified the secondary keyword matching method to use TFIDF scores given from simple_tfidf calculation for the unmatched keywords of the search text.
I implemented ArticleClassifier to classify articles from the search question type. This was implemented after the discussion to create a classification by the type of the question ('what', 'how', 'is', 'does', 'why', 'which') and classify matching articles accordingly. We expect this classification will give an improvement to the search since it will it will classify articles in a way which identifies what kind of an answer the user needs.
I modified the ArticleClassifier to pos_tag article topics and save the pattern in the database. Then I implemented a training data set generator for the Stanford Classifier which uses the saved pos_tag patterns from the database. Then I generated the article_type collection by manually classifying the topics using the ArticleClassifier to create the training set.
I added the generate_test_set method to generate the test data set for the Stanford Classifier to classify article topic types. Then I added update_with _types method to classify articles from the articles collection using the trained Stanford Classifier.
Then I separated the keyword mapping collection generation from article collection generation since it was too slow and crashing while generation.  We isolated the article generation and ran it. Then I added keyword mapping collection generation as a separate method.
Then I trained and tested the accuracy of the topic type classifier. The classification accuracy was averaged 80.7% for classifying articles as the question type (how, what, why, does). Next step will be applying the topic type classification for the search.
I added the SearchQuestionClassifier to get the question type of the search text. This takes the search text and using the trained article_type classifier returns the relevant question type. This is used in the smart_search algorithm to identify the user’s question type.
Then I added the base layout template for the frontend and then added front end templates for show_articles page and search page with the javascript files.
Then I added the route in the app.py to get the best results from the smart_search which are generated using a combination of simple_search (which takes all the keywords from article parents and content) and learned data from vote up function. Then I showed the best articles from the smart_search on the search page using an ajax call to get the data.
Then I added article links to the search results. Next I will do some improvements to the frontend such as correction for the links from documentation, adding topic hierarchy path of the article and highlighting the first article.
The Web Application
The system had to be exposed to external users over the internet and thus we had to expose the functionality via a web app. The web app was expected to provide the capability to specify a query or a set of keywords and retrieve concise view of articles from documentation knowledge base or issues from support knowledge base, In addition it was also expected to provide certain capabilities to the admins of the system.
Since the system was implemented in Python we used Flask framework which although is a microframework is powerful and simple to use.
Although the web app was a simple one, we paid special attention to the design of the interfaces as we wanted to make it as simple and user-friendly as possible to ensure engineers could identify what they want as soon as possible. The app had three main interfaces as follows:
·         Search Interface -
The Search or the Main Interface shown in Fig. 9.1 provides the user with a search box to enter the query or set of keywords and click on either the ‘Search Docs’, to search in the documentation KB or ‘Search Tickets’, to search in the Support KB.
·         Documentation Search Results Interface -
The Documentation Search Results interface which is displayed to all users displays the top solutions obtained from the intelligent search on documentation KB to the user once he specifies the set of keywords. For each displayed result a recommending button is given.
If the user wants to know more about the article he can click on the topic of each issue which would display the full article page rather than the concise view given in the search results page and the link to the original source (WSO2 documentation page).
·         Tickets Search Results Interface -
The Tickets Search Results interface which is displayed to all users displays the top solutions obtained from the intelligent search on support KB to the user once he specifies the set of keywords. For each displayed result the fields displayed are the issue summary, view count and short description. More detailed view can be viewed by clicking ‘view additional information’. Code/ log line blocks are collapsed and can be viewed by clicking ‘view code/log’. Question sentences and matched keywords are bolded for improve presentation. For each displayed result a recommending button is given.
If the user wants to view the answer he can click on a “View Solution” button for each issue which would display to him the summarized answer and the link to the original source (JIRA Ticket [16]). Clicking on the “View More” button also increments the view count for the issue.

Figure 4 Search Interface
·         Article/Issue Interface -
The full article/ Issue page can be viewed by clicking on article topic or issue title.
·         Admin Interface -
The admin interface accessible only to admins provides several capabilities to admins as described below:

Figure 5 Admin Interface - View Usage
§  View Usage - As shown in Figure 2.6 it is possible to get an idea about the number of logins, searches performed
§  View most recent searches- To view the most recent search queries searched in the knowledge base
§  View most recent votes - To view the most recent recommendations done by the users.
§  View Most Viewed Issues - To get an idea about the most popular topics that are searched in the knowledge base
§  Update Knowledge Base -
o   JIRA - To specify a time period and update the knowledge base with Jira tickets from that period and view the progress in updating.
o   Documentation - This can be done in 3 steps.
·         Update Knowledge Sources (extract the WSO2 documentation) - When the content of documentation has been changed, it is required to re-scan the documentation and re-generate Knowledge Base.
·         Update Knowledge Base - Generate the articles for the knowledge base using the extracted doc pages.
·         Keyword Indexing - Keywords are indexed and keywords - articles mapping is created.
§  Manage Admins - To add/remove admins

2.5        Releasing WSO2 Support Knowledge Base Version 3.0

Having implemented the above, we deployed the Support Knowledge Base Version 3.0 to get feedback from actual users. The users with wso2 email address can access the deployed system via https://supportkb.wso2.com/. Initially, the system was only available through private network of WSO2. But later on, it was made accessing over internet as well, provided that user is authenticated by AppM Login system.
We used Apache mod_wsgi to host the web app considering the reliability and since Apache was used widely in the company.

Figure 6 WSO2 Support Knowledge Base Version 3.0 release

2.5.1       Support Knowledge Base Version 3.0

Requirement
In addition to the public Jira, all paid customers of WSO2 who receive WSO2 Support Services are entitled to individual Jira accounts through which they can raise the issues they identify as Jira tickets. WSO2 engineers working in support then work on these tickets and provide solutions to these issues using back and forth communication via comments on the ticket. If the answers provided by WSO2 resolve the customer’s query the ticket is marked as resolved/closed. Furthermore internal discussions are carried out on the “support-dev” mailing group where engineers discuss within themselves as to what the possible solutions for an issue could be. Currently, there is no central database maintaining these information and when a new issue was raised WSO2 engineers had to manually search through documentations, past resolved/closed issues and mail threads among other resources to identify if solutions already existed.
What the WSO2 Support Knowledge Base was expected to deliver was an internal knowledge base that maintained past resolved issues, in terms of Jira tickets, Documentation and Gmail threads* and eventually other resources such as OSQA and Stack Overflow questions, and when a user (a WSO2 Engineer) specifies the issues, either in natural language or as a set of keywords, identify similar solved issues from the past and other relevant resources and suggest the issues along with summarized answers built either from the comments on the Jira ticket, replies to the Gmail thread* or content from WSO2 Documentation, to the user as possible solutions to the issue. The weights used for Term Frequency Calculation uses the criteria as follow.
Advantage
The main advantage of the WSO2 Support Knowledge Base is that it would significantly reduce the time spent by an engineer on identifying the solution for an issue if the issue had already been resolved. Without the WSO2 Support Knowledge Base a user would manually search the Jira system, Gmail threads and WSO2 Docs for possible solutions but the WSO2 Support Knowledge Base would be a one stop solution that instantly suggests similar issues by searching across all the resources. As this reduces the time the engineer spends on identifying possible solutions this would leave him with more time to focus on the actual solution.
Furthermore this would help ensure that no possible similar issues are missed by human error thereby ensuring that the support engineer does not reinvent the wheel or spend valuable time on an already resolved issue.
These advantages of the WSO2 Support Knowledge Base not only ensure that the support engineers use their valuable time and efforts effectively but also help improve the efficiency and effectiveness of the WSO2 Support System.

Architecture
High level architectural view of the system is shown below in the Figure 2.8

Figure 7 High level architecture
Support Tickets Knowledge Base
The Support Tickets Knowledge Base is a Knowledge Base on Support JIRA Tickets of WSO2. A user can search for similar issues (to a current issue) that has occurred in the past using this system.
Knowledge Base Creation for Support Tickets KB - The TicketsKBGenerator extracts tickets from WSO2 Support JIRA System into the Backend Database of KB through API exposed by JIRA System.
Afterwards, the sentences in Ticket Description and Ticket Comments were classified and tagged as “Question”, “Plain Text”, “Code”, “Log” and “Ignorable” using Stanford Classifier. Additionally, comments of tickets were classified as “Question Comment”, “Answer Comment”, and “Ignorable Comment” using Stanford Classifier.  Then, the tickets were further processed to identify Keywords and an index of Keywords was generated. The index contains a Weighted Term Frequency calculated for each keyword-ticket combination based on a customized version of Term Frequency Inverse Document Frequency algorithm (TF-IDF) algorithm.

Figure 8 WSO2 Support Tickets Knowledge Base - KB Generator Architecture
Solution Generation for Support Tickets KB - The Solution Generator of Support Tickets KB implements “Search Tickets” Functionality of WSO2 Knowledge Base. This is basically a keyword based search with enhancements based on user recommendations of search results (Continues Learning System).
The solution generator on Support Tickets first extracts keywords of search text using RAKE [17] algorithm. Afterwards, it uses Keywords collection in database to identify matching Support Tickets for the Query. Afterwards, the results were sorted based on total TF-IDF value calculated for each search result. The top 10 results were chosen to be displayed to the user.
Afterwards, the Continuous Learning System enhances points given to user recommended solutions. Then, the results are re-sorted based on new scores.
Finally, the summarizer summarizes search results based on Tags on Sentences of Description and Comments. Then, the results are displayed to the User.

Figure 9 WSO2 Support Tickets Knowledge Base - Solution Generator Architecture
Documentation Knowledge Base
The Documentation Knowledge Base is a Knowledge Base on official WSO2 Documentation available at https://docs.wso2.com. This KB contains sections of documentation pages (Whole Pages, Headings and Subheadings of Pages, Paragraphs) arranged in a single giant hierarchical tree structure. A user could search for any WSO2 product related inquiry in this system and obtain the relevant sections for the inquiry.
Knowledge Base Creation for Documentation KB - The Knowledge Base Creation on WSO2 Documentation is a lengthy process which is subdivided into 3 processes explained below.
1.      Documentation Extraction - Extract Pages in WSO2 Docs using Scrapy Web Crawler. (refer DocumentationExtractor.py for implementation)
2.      Articles Generation - Divide extracted pages into sections of headings and paragraphs (aka Articles in this Document). Then, arrange the headings and paragraphs into a hierarchical structure representing structure of documentation page. (Eg- paragraphs listed under a heading are identified as sub-items of the heading. Sub-headings listed under a heading are identified as sub-headings of the heading.) Then, this hierarchical structure of the page is connected to the overall hierarchical tree generated for WSO2 Doc by connecting with the appropriate parent page node.
3.      Keywords Generation - Keywords are generated for each and every article recognized by Article Generator. These keywords of an article are extracted from its own text and titles of its ancestor headings. Afterwards, these keywords are indexed to a keywords collection (known as article_keywords) with Term Frequency Values in a similar manner to index of JIRA Tickets keywords in DB. Additionally, a separate index by the name description_keywords is also generated in DB for headings. This includes keywords from article_keywords with the addition of keywords extracted from immediate child paragraphs/headings.

Figure 10 WSO2 Docs Knowledge Base - KB Generator Architecture
Solution Generation for Documentation Knowledge Base - The Solution Generator of Documentation KB implements “Search Docs” functionality of Support Knowledge Base. This is basically a keyword based search with enhancements based on user recommendations of search results (Continues Learning System).
The Solution Generator first extracts keywords in user inquiry using RAKE. Afterwards, the keywords are searched in “description_keywords” index and matching articles were obtained. Afterwards, the articles were sorted based on total TF-IDF value calculated for each article. Then, the top 10 solutions were selected and re-sorted based on user recommendations using Continuous Learning System. Finally, the results were displayed to the user.

Figure 11 WSO2 Doc Knowledge Base - Solution Generator Architecture

2.6        Introductory Sessions

We discussed and planned introductory sessions for support teams and made a script for the sessions which includes hands on session for try the system with the current support issues. We conducted 5 sessions for the Virtusa support team, ESB support team, IS support team, APIM support team and permanent support team. The objective of these sessions were to get the employees familiar and use the “Support Knowledge Base” system.
Session Outline
1.      Basic introduction to the system introducing both documentation search and tickets search and the 2 knowledge bases behind the 2 searches.
2.      Introducing the learning feature we implemented in the application so that users can recommend and promote the solutions, through which the system will improve over time.
3.      Asking to access the system and doing a demonstration of how the system works. Showing some example questions for both searches.
4.      Explaining the scenarios which best fits for each search. Users can use the 'Search Docs' to search a WSO2 product related question.
Eg:
·         How to configure ESB with Identity server?
·         What are the system requirements for Identity Server?
·         How to cluster ESB?
5.      Users can use the 'Search Tickets' to search a support questions related to bugs and issues in products.
Eg:
·         How to control the hostname that gets set for accessing the WSDL in ESB?
·         How to preserve header in SOAP content sent by ESB?
6.      Asking to try out the system with their current ticket issues.
7.      Doing a Q & A session.
8.      Taking feedback.
Feedback from the sessions
1.      Extending the tickets based search to include matches from all ticket comments.
2.      Extending the tickets based search to include sections identified as Logs or Code.
3.      Extend the KB to Blogs, Articles, White Papers, and Stack Overflow etc.
4.      Extract knowledge from OSQA system as well.
5.      Bring all sources for Support Engineers to one place.
6.      Remove the limit of tickets shown as solutions.
7.      Automating Knowledge Base Update
8.      Improve Performance of Doc KB Search
9.      Add a Spell Checker for the Search.
10.  Add Suggestions showing feature to the Search
11.  Use the Metadata Document on Blogs

3        Other Activities

3.1        Training Sessions

I participated in several training sessions conducted for wso2 employees which were organized by the HR team and done by senior employees. I participated in following sessions.
1.      GIT beyond basics
2.      Docker training
3.      Server administration training

3.2        Recreational Activities

During the lunch time and tea time we have several recreational activities we did like Table Tennis, Carrom, Pool, Foosball and playing musical instruments. This was a great opportunity to get to know employees who we could not meet during the work and have fun.

Figure 12 Playing Table Tennis

3.3        Inter-house badminton tournament

Inter house badminton tournament was organized and we took part in it. My doubles partner was Mr. Jithendra from Infrastructure team. It was a great chance to meet new people and have fun.

Figure 13 Inter house badminton Championship

3.4        “Smart@ss” Quiz

This was a General Knowledge Quiz organized by HR team for the employees. Each house had 2 teams. I was in the ‘Titans-2’ team.

Figure 14 Smart@ss Quiz

3.5        Inter-floor basketball tournament

An inter-floor basketball tournament was held and I took part in the 2nd floor team and we emerged as 1st runners up after only losing the final to the 7th floor team. We got Pizza Hut vouchers as gifts and our team went on a lunch out later.

3.6        Tea time Championship

A team time championship was organized for Table Tennis, carom, foosball and pool. I took part in all activities. I lost foosball and carom. In Table Tennis I got into semifinals in men’s singles, men’s doubles and mix doubles but could not finish because the internship period ended before the tournament concluded.

4        Conclusion

On 23rd of December 2016 we concluded our projects and finished our internship at WSO2. We handed over our office equipment to Mr. Dinesh. On this day we participated the WSO2 year end party at Eagle Lakeside Hotel and we had a lot of fun.
We had our internship presentations at WSO2 on 2nd March 2017 and after finishing all our presentations WSO2 took us for a GO-Karting plus dinner outing. Pramila and Nirshan from the HR joined us and we had so much fun racing.
We are very grateful the way WSO2 treat their interns in addition to the superb learning experience they provide. A BIG THANKS to WSO2!

ACKNOWLEDGEMENTS

I would like to express my gratitude to everyone who has supported me to make my internship a success and to making it memorable experience in my life.
First of all I would like to thank Dr. Chathura R. De Silva, Head of the Department of Computer Science and Engineering, University of Moratuwa and Dr. Dilum Bandara, Industrial Training Coordinator of the Department of Computer Science and Engineering for the immense effort taken to provide us with best training establishments and for the commitment shown in making sure each and every student was selected by well-established Software Development Companies.
I would like to extend my gratitude to Mr. Nihal Wijeyewickrema the director of Industrial Training Division, University of Moratuwa and all the members of Industrial Training Division for their efforts to make our stay at these training establishments a pleasant one. Also I must thank all the members at National Apprentice and Industrial Training Authority (NAITA) for guiding us from the very beginning, and for the work done throughout the internship period to make it a success.
Also I am grateful to Dr. Sanjiva Weerawarana, Founder, Chairman and CEO of WSO2, for allowing me to be a part of WSO2 family and giving us an invaluable opportunity to learn world renowned technologies within a great working environment. I would like to thank Ms Pramila Rajapaksa, Director of Human Resources and Administration at WSO2, Mr. Samisa Abyesinghe, Vice President-Delivery, who was my mentor during the internship period. Also I would like to thank all my colleagues at WSO2 Support team for being there for me whenever I needed any guidance. Furthermore, I would like to thank each and every person at WSO2 Lanka (Pvt) Ltd who has helped me in various ways to have an awesome internship experience at WSO2.
Last but not least, I would take this opportunity to thank my fellow interns at WSO2 who were there with me, during my internship period, sharing joy as well as work. This experience would not be the same without your presence in it.

REFERENCES

[1]
"WSO2 Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/WSO2.
[2]
"Stanford Column Data Classifier," [Online]. Available: http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/classify/ ColumnDataClassifier.html.
[3]
"Py4J," [Online]. Available: https://www.py4j.org.
[4]
"Facebook Fast Text Classifier," [Online]. Available: https://github.com/facebookresearch/fastText.
[5]
"StackOverflow Lemmatization," [Online]. Available: http://stackoverflow.com/questions/39302880/getting-the-root-word-using-the-
wordnet-lemmatizer.
[6]
"Lemmatisation," [Online]. Available: https://en.wikipedia.org/wiki/Lemmatisation.
[7]
"WSGI," [Online]. Available: https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface.
[8]
"Flask," [Online]. Available: http://flask.pocoo.org.
[9]
"mod_wsgi," [Online]. Available: http://flask.pocoo.org/docs/0.12/deploying/mod_wsgi/.
[10]
"Howdy’s botkit," [Online]. Available: https://github.com/howdyai/botkit).
[11]
"Slack Bot Integration," [Online]. Available: https://my.slack.com/services/new/bot.
[12]
"PromisesJS," [Online]. Available: https://www.promisejs.org/.
[13]
"Scrapy," [Online]. Available: https://scrapy.org.
[14]
"WSO2 documentation," [Online]. Available: https://docs.wso2.com/.
[15]
"TFIDF," [Online]. Available: https://en.wikipedia.org/wiki/Tf%E2%80%93idf.
[16]
"JIRA," [Online]. Available: http://wso2.com/library/1146/.
[17]
"RAKE," [Online]. Available: https://github.com/zelandiya/RAKE-tutorial.

No comments:

Post a Comment