AI Model Development Process – From A to Z

a lady preparing a machine learning model

Initial steps

Define – Question the client & the business team

Identify the problem
Understand the ideal outcome and the “good enough” outcome
Define the metrics to be evaluated and their importance
Learn the business logic
Understand the functional & non-functional requirements
Find weaknesses & special constraints
List assumptions

Data

Identify the sources of data
Learn the data
- Look at examples and tags – try to learn the factors
- Make sure the data is labeled consistently
Spot weaknesses/problems in the data – imbalances, n/a’s
Look for enhanced useful features
- based on current features (phone number to the area)
- easily obtained features (holidays)
Create a processing strategy (feature → what is required)

Formulate Your Problem as an ML Problem

Articulate your problem as an optimization problem
Think About Potential Bias
Frame your problem – classification/regression/anomaly, etc
Choose 1-3 initial features
Test Ability to Learn – correlations, PP-score
- noisy labels, ability to generalize, enough examples

More points to look at:

Start with a problem and not the solution
ML is not always the solution
From Simple → complex

EDA

Get to know the data

This is a tricky part, where you might lose precious time without any output.

The most important principle in EDA is to:

And a few more important points:

Use visualizations to learn about distributions, outliers, and more
If possible, use active visualization with Bokeh or Streamlight
Write your conclusions after this step in your notebook or .md file

Data Processing

Build the data processing into a reusable pipeline

So we avoid leakage and have a consistent transformation
If the transformation is changing we cannot compare models
Save and version transformed data

Finished a transformation?

Save the processed data and version it
- Compare models on similar data

Modeling

Create a baseline model

Use autoML tools, or a simple heuristic
If not – create a simple model such as linear or logistic regression
Start from a simple feature space
- Look for correlated features
  - Remove redundant features with L1 regularization

Design experiments

Small-stepped, simple, well-defined experiments
Document the “why”
Look at metrics, assumptions, limitations, and state
Set a research goal
Make a hypothesis

Train a model

Evaluate the results and analyze
- Check for failures
- Visualize results
Think about solutions
- Look at false positives and false negatives – ask why the model failed?
  - Can visualize the mistake
- Look for noisy features
  - Use L1 regularization
- Think of a better model for the task
Refine the hypothesis and repeat
- Better results
  - Hitting the “good enough” – move on
  - If not – add more features, more data, more complexity
- Worst results
  - Debug the model
  - Test another direction

After that

Optimize hyperparameters
- Optuna or other tools
Write smoke tests
Write unit test-optional

Production

When?
- The model reaches “good enough”
- Next phases
Clean and structure the code
- Simplify the input and processing
Optimize your model for serving
- optimize performance (if required)
- optimize for hardware
Wrap model
- as REST API or other serving option
- Dockerize if required
Design monitoring
- Model performance (evaluation metrics) – user clicked or not
- Resources tracking – CPU/GPU
- Connect to automated alerts/reports system
Automated pipeline
- Define retraining policy
- Execute

AI Model Development Process – From A to Z

Initial steps

Define – Question the client & the business team

Data

Formulate Your Problem as an ML Problem

More points to look at:

EDA

Get to know the data

Data Processing

Build the data processing into a reusable pipeline

Finished a transformation?

Modeling

Create a baseline model

Design experiments

Train a model

After that

Production

Comments

Add a comment

Leave a Reply · Cancel reply