In the first blog of this series – Canary In The Coal Mine: Actionable-Insights To Mitigate Development Risks And Red-Flags, we introduced ‘Project Canary’- a data-driven framework that predicts and flags risks so that engineering teams can intervene early to mitigate them. While the previous blog was regarding different parameters and signals that the framework tracks to raise alerts, in this second blog installment, we showcase how the Canary framework reacts to signals and communicates so that data-driven actionable insights can be facilitated and realized in a time-sensitive manner.
How does ‘Project Canary’ interpret and understand the warning signals?
In recent years, Artificial Intelligence (AI) and Machine Learning (ML) have evolved as key technology building blocks, allowing computers to use data and algorithms to predict outcomes and identify possibilities of threats and vulnerabilities. Given sufficient time, ML can evolve further by deploying deep learning techniques to make machines more intelligent and predict outcomes more accurately. However, as these scenarios play out, it is also important to not get carried away by the possibilities of AI/ML. Sometimes it is necessary to take a step back and build a list of problems we are trying to solve and evaluate if AI/ML is the silver bullet to address all of them or if there is a need for a hybrid approach that leverages multiple AI/ML models and approaches based on the individual feature set used as a leading indicator.
In the case of Project Canary, we first started with simple statistical models as a baseline and in parallel, after a lot of data curation, we are gradually moving towards developing deep learning models to improve predictions.
How is the data gathered and pre-processed?
A good first step in most ‘prediction’ use-cases solved is using statistical computation and/or machine learning, if a sufficient amount of historical data and a good understanding of the input feature set is available. If your machine learning model doesn’t have quality data feeding into it, then the resulting predictions will be highly inaccurate. Hence, data pre-processing is necessary to clean the data and convert it into a form from which you can use leading indicators for predictions. This is often a challenging and time-consuming process.
In the case of Project Canary, we too had to do a fair amount of pre-processing due to the unstructured nature of the enterprise data. We also had to integrate the data gathered via multiple sources, to be able to analyze it using AI/ML techniques.
What are the algorithms used by Canary?
Once the data is preprocessed and ready, it becomes possible to decide which modeling strategy fits best. Of the different modeling strategies available for predictive analytics, Project Canary utilizes outlier detection algorithms in order to predict and flag risks. An outlier is defined as a finding that diverges from an overall pattern on a sample.
While there are multiple approaches for identifying outliers like z-score or extreme value analysis, probabilistic and statistical models, and proximity-based models, these approaches are not effective in:
- Identifying the abnormal attributes when the outlier has abnormal values on more than the specific number of its attributes
- Discovering accurate rules to detect outliers and their abnormal attributes.
Hence, Canary has used pattern-based outlier detection methods. These methods identify abnormal attributes after discovering the reliable frequent patterns that reflect the typical characteristics of that feature set. Along with this, we have also used the z-score and isolation forest algorithm for some of the features to predict alerts. As we progress and build a good feedback mechanism, this would also be able to be utilized in order to train a supervised machine learning model for automated predictions.
How does Canary send the warnings?
The goal of Canary is to move away from static dashboards to more task-centric and actionable insights. Dashboards are a good tool for decision making but are typically time-consuming to interpret. They don’t prescriptively tell what the user is supposed to do just by looking at the chart or an insight. As a part of Canary, we wanted to give the users short alert messages in the form of actionable cards on everyday applications (Outlook, Teams, etc) without requiring users to navigate to another dashboard or tool to check what went wrong.
Moving away from dashboard-like applications requires a change in mindset and also different approaches of understanding the underlying data structures and automation. Canary is implemented in a way so that the users do not need to go and track the complex data visualizations but can get relevant insights right in their mailboxes in form of compact adaptive cards.
Is Canary always helpful?
In the case of the coal mine, the miners are clear on what action to take in case of a warning. In a software engineering scenario, project managers would need some additional details on suggested actions that they can take based on the data. We plan to provide this in the form of action buttons depending on scenarios. Engineering managers may then decide to optionally navigate to the underlying traditional organization dashboards if needed to track and handle the situation.
Canary also had a feedback mechanism to seek inputs from the users for the personalized alerts delivered in their mailboxes. Learning from these will help the system improve accuracy and relevance.
Conclusion: Canary be useful “beyond the mines”
Predictive Project Management using AI/ML is the future of project and program management. We not only can use this solution to help in predictive analytics, but it also can help the project managers plan more accurately, automate the mundane tasks, and deliver higher value to the customers. For us, Project Canary is just one step towards our efforts to help engineering managers. Also, once Canary reaches its level of intelligence, it can be used to work with different sets of underlying data, including those beyond the project management space.
There is still a wealth of dark data yet to be explored. As dark data is a popular topic of interest, we will also share in an upcoming blog how Persistent is using this data in another larger context to improve and enable sales teams to do their jobs better.
If you think your industry has a similar problem(s) to solve and have a set of unutilized data sitting in dark, please reach out to us