Skip to main content

A Business Principal's guide to Machine Learning - Solutioning

A Business Principal needs all the tools and techniques that is available for solutioning.

With Machine learning emerging as a key trend in the upcoming years, it makes more sense to understand how to use it for solutioning.

I have used a lot of the material for this post from here: 

The Step-By-Step PM Guide to Building Machine Learning Based Products







Where do I use Machine Learning?

Machine Learning is a solution - define the problem first

This is a caveat for any new tool or technology.
Machine learning is not something that we use because it is new. It is something we should use because it fits our use case and helps us achieve the outcome.
Internal Process
  • Where do people in my company today apply knowledge to make decisions that could be automated, so their skills could be better leveraged elsewhere?
  • What is the data that people in my company normally search for, collect or extract manually from certain repositories of information and how can this be automated?
Products & Experience
  • What parts of my customer interactions are customized by people and could potentially be customized by machines?
  • Do I have a clear segmentation of my customers based on their preferences, behaviors and needs? Is my product / experience customized for each segment?
New Customers
  • Do I have any data that could be useful to other stakeholders in the industry or in adjacent industries? What sort of decisions can it help these stakeholders make?
Where do I start?

Some Terminologies

Algorithm: A machine to start by looking at a given set of inputs and a set of outputs that correspond to those inputs
Model:  The machines to solve a specific problem. Models are not 100% correct, but are rather “best guesses” given the amount of data the model has seen. The more data the model has seen, the more likely it is to give useful output.
Training Set: The set of known inputs and outputs the data scientist uses to “train” the machine — i.e. let the model identify patterns in the data and create rules — is the “training set”
Validation Set: fresh set of data called the “validation set”. We run the models on the validation set inputs to see which one gives results that are closest to the validation set outputs.
Test Set: Fresh data set on how good the best model really is in solving the problem.
Types of Learning

Supervised Learning: A type of learning where an algorithm needs to see a lot of labeled data examples — data that is comprised of both inputs and the corresponding output, in order to work. The “labeled” part refers to tagging the inputs with the outcome the model is trying to predict.
Problems used for:
  • Regression: Inferring the value of an unknown variable based on other pieces of data that it stands to reason would have an effect on that variable
  • Classification. Identifying which category an entity belongs to out of a given set of categories.
Unsupervised Learning: The algorithm tries to identify patterns in the data without the need to tag the data set with the desired outcome.
Problems used for:
  • Clustering. Given a certain similarity criteria, find which items are more similar to one another.
  • Association. Categorize objects into buckets based on some relationship, so that the presence of one object in a bucket predicts the presence of another.
  • Anomaly detection. Identifying unexpected patterns in data that need to be flagged and handled.
Semi-supervised Learning: This is a hybrid between supervised and unsupervised learning, where the algorithm requires some training data, but a lot less than in the case of supervised learning
Reinforcement Learning: The algorithm starts with a limited set of data and learns as it gets more feedback about its predictions over time.
Selection of a solution

Algorithm Selection:
  • Select an algorithm based on problem
  • Based on the problem , decide on the type of learning that you expect - unsupervised vs supervised and pick the relevant algorithm
Feature Selection:
  • Also called as variables or attributes
  • Independent pieces of data that is used to describe the pattern we are trying to identify or predict
  • We pick the features based on what the problem we are trying to solve
Objective function selection:
  • This is the outcome the model is trying to predict
  • This depends on the business goal 
  • Also depends on the data available. Based on the data available, the objective function can be refined to more precisely predict the outcome
Explainability:
  • Output of ML models are typically cryptic
  • The models are a blackbox
  • If we use a clustering model, there is no easy way for the end user to understand what each cluster really means for them to take a decision on that problem.


Pitfalls to avoid
Overfitting:
  • If a model fits the training data too closely, it might mimic the noise in the data as well
  • This might lead to inaccurate forecasts and hence might not be a good model
Precision and recall:
  • Precision is the share of true positive predictions our of all positive predictions for a specific outcome
  • Recall is share of positive predictions out of all positive outcomes in that problem statement
  • There needs to be a business decision on what should the ML model focus on Precision or Recall

Measuring model accuracy
Metrics :
  • Model accuracy cannot be taken at face value since the training data could be skewed when related to the actual real time data
  • Model can predict with different accuracy with different set of data segments. Averaging all of the accuracies into a single model accuracy metric can be misleading 


Building a good ML Model
  • Building an ML Model is actually building a product that would in turn have an ML model.
  • Building an ML model in isolation should be avoided.
Ideation:
  • Find out the real business problem or challenge
  • Align all stakeholders
  • Choose an objective function - the specific outcome that would help you with the problem. 
  • Define what a good model should do for you - is it accuracy, precision or recall.
  • Come up with a tentative list of inputs that you expect the model needs to provide the outcome.
From my past experience, This is the most important phase of interaction for the Business principal. We need to make sure that we have alignment on the business problem and have all the stakeholders call out the list of inputs. 

Once we have that list of inputs, we can move to the next phase.

Data Preparation:
  • Look at the data available today
  • Find out the gaps in data available versus data needed for the model
  • Try to balance between various data acquisition methods like connecting to a database, uploading data or building a data lake with queues
  • Clean up the data and translate the data into the format that the model needs to work
  • Look for ways to reuse - e.g. libraries that are already available
  • This might require us to break down the data ingestion into steps where other tools can be reused
  • This step is iterative and we will repeat it multiple times before we get this right
In this phase, the Business principal is interacting with various other stakeholders in terms of data availability. It is important that the business principal owns the entire product so that the prototype comes out successful and the business outcome is achieved.

The business principal needs to work through and figure out all possible ways to get the initial set of data for the model to run.

Prototyping and testing:
  • Build a prototype - experiment with most important feature
  • Try different algorithms for the most important feature
  • Repeat till you get the most desirable set of outputs
  • Test the performance of the model with the validation data set to calibrate and make sure it works
  • Iterate till you get the right algorithms that provide the desired model performance
In this phase, the Business Principal is tasked with helping the business understand the output of the machine learning models. 

This is where the challenge of explainability comes in. The models cannot remain mysterious and should not need experts for decision making.

Productisation:
  • Once you have built a model prototype, you need to figure out a way to scale
  • Increase more features
  • Increase more datasets
  • Scale data collection to more data sources
  • Refresh data using data pipelines
  • Scale models to include additional business scenarios
In this phase, the Business principal's main involvement is in translating the intent from the prototype to make sure that the product that gets built adds business value.


Product Outliers:
  • As you scale, you might find that the ML model starts providing insights that you did not expect
  • This could be because you increase data coverage or your model predicts more business scenarios
  • Watch out for these outliers and go back to product design

In this phase, the Business principal works with the Product management team and the IT team to ensure that prototyping happens continuously for the business challenges to be met.

Comments

  1. Very informative, builds a lot of interest towards analyzing current system from the AI and machine learning point of view... Thanks for sharing... Keep it up :)

    ReplyDelete

Post a Comment

Popular posts from this blog

My Journey in Inquiry and Advocacy - An experience report

It is recently that I have consciously started practicing Inquiry. Let me explain. I am a consultant who constantly looks at the situation and comes up and implements the solution to progress from there. While I do that, I constantly use Inquiry as a means to progress - one of the key facilitation technique specifically in multiple stakeholder situations.

Principles for developing systems that are anti-fragile

I have been trying to make sense of what anti-fragility means and how do I use that in my day job. As a Business Principal, I tend to work with the abstract but orchestrate a program of work that needs details. This makes my job a little difficult in the terms of designing for more self-preserving systems that preserve the spirit of the abstracted strategy or vision. I came across an article from Daniel Russo on anti-fragility and his attempt at creating a manifesto similar to the manifesto for agile software development. For more reading on Daniel Russo, here is his profile:  http://djrusso.github.io More reading from his paper here:  https://www.sciencedirect.com/science/article/pii/S1877050916302290 This post is an attempt for me to understand what goes into developing a program that uses every opportunity to strengthen itself and achieve its objective - the vision.  I liked the approach of principles for developing systems that are anti-fragile. It i...

User Personas

User Personas are a very good tool for the product owners, business analysts or product managers to be able to co-create with designers. It is predominantly a product of the user research and should not be an amalgamation of demographic data. It is the best way for us to list all scenarios that a persona would take when they want to attain a goal. It is predominantly used to build empathy with user, focus the team and build consensus in a large diverse stakeholder group. The website I referred to is here:  https://www.smashingmagazine.com/2014/08/a-closer-look-at-personas-part-1/