Root cause analysis and escape defect analysis improvement at continuous delivery by data-driven decision-making
Saapunki, Katja (2023)
Saapunki, Katja
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023060117188
https://urn.fi/URN:NBN:fi:amk-2023060117188
Tiivistelmä
This thesis was done to software development organization of global telecommunication vendor. The main driver for this thesis was the ongoing transition of continuous integration (CI) and continuous delivery (CD) practices at the telecommunication sector. It challenges current software development methods by fast delivery cycles and high software quality requirements. One of the impacted areas in that evolution is root cause analysis (RCA) and escape defect analysis (EDA).
The main targets of this study were to identify pain points in company’s current RCA/EDA practices and to define ways how RCA/EDAs could be left-shifted in software development life cycle and that way better support CI/CD to fulfill customer expectations.
The research was a mixed method study with embedded design. Quantitative data was collected from the company’s internal tools and databases. Qualitative data collection was done by interviewing company workers who are involved in RCA/EDA work and by taking part in company’s internal workshops related to RCA/EDA. Data analysis and visualization was done in Power BI and results combined both quantitative and qualitative data, nevertheless the bulk was quantitative research.
According to analysis main pain points of current RCA/EDA practice were related to the long lead time of RCA/EDA, high ratio of completed analysis without any preventive actions named or implemented and imprecise RCA/EDA targeting, which caused software faults leakage to customers.
To support CI/CD better, improve customer experience and manage effort and costs more efficiently, three main enhancements were proposed:
1. RCA/EDA decisions for internal defects to be managed via automated data-driven decision making with customer impact included.
2. Metrics to support priorization and offer visibility to the whole pipeline starting from defect reporting and ending to implemented preventive actions.
3. Use RCA as a tool for finding root cause of topics which went well to enable learnings from successful practices and processes at organization level.
Machine learning opportunities in software fault handling and RCA/EDA work was not considered in this study, but it was seen as an opportunity for further research.
The main targets of this study were to identify pain points in company’s current RCA/EDA practices and to define ways how RCA/EDAs could be left-shifted in software development life cycle and that way better support CI/CD to fulfill customer expectations.
The research was a mixed method study with embedded design. Quantitative data was collected from the company’s internal tools and databases. Qualitative data collection was done by interviewing company workers who are involved in RCA/EDA work and by taking part in company’s internal workshops related to RCA/EDA. Data analysis and visualization was done in Power BI and results combined both quantitative and qualitative data, nevertheless the bulk was quantitative research.
According to analysis main pain points of current RCA/EDA practice were related to the long lead time of RCA/EDA, high ratio of completed analysis without any preventive actions named or implemented and imprecise RCA/EDA targeting, which caused software faults leakage to customers.
To support CI/CD better, improve customer experience and manage effort and costs more efficiently, three main enhancements were proposed:
1. RCA/EDA decisions for internal defects to be managed via automated data-driven decision making with customer impact included.
2. Metrics to support priorization and offer visibility to the whole pipeline starting from defect reporting and ending to implemented preventive actions.
3. Use RCA as a tool for finding root cause of topics which went well to enable learnings from successful practices and processes at organization level.
Machine learning opportunities in software fault handling and RCA/EDA work was not considered in this study, but it was seen as an opportunity for further research.