AI Model Development Process – From A to Z

Initial steps

Define – Question the client & the business team

  • Identify the problem
  • Understand the ideal outcome and the “good enough” outcome
  • Define the metrics to be evaluated and their importance
  • Learn the business logic
  • Understand the functional & non-functional requirements
  • Find weaknesses & special constraints
  • List assumptions

Data

  • Identify the sources of data
  • Learn the data
    • Look at examples and tags – try to learn the factors
    • Make sure the data is labeled consistently
  • Spot weaknesses/problems in the data – imbalances, n/a’s
  • Look for enhanced useful features
    • based on current features (phone number to the area)
    • easily obtained features (holidays)
  • Create a processing strategy (feature → what is required)

Formulate Your Problem as an ML Problem

  • Articulate your problem as an optimization problem
  • Think About Potential Bias
  • Frame your problem – classification/regression/anomaly, etc
  • Choose 1-3 initial features
  • Test Ability to Learn – correlations, PP-score
    • noisy labels, ability to generalize, enough examples

More points to look at:

  • Start with a problem and not the solution
  • ML is not always the solution
  • From Simple → complex

EDA

Get to know the data

This is a tricky part, where you might lose precious time without any output.

The most important principle in EDA is to:

And a few more important points:

  • Use visualizations to learn about distributions, outliers, and more
  • If possible, use active visualization with Bokeh or Streamlight
  • Write your conclusions after this step in your notebook or .md file

Data Processing

Build the data processing into a reusable pipeline

  • So we avoid leakage and have a consistent transformation
  • If the transformation is changing we cannot compare models
  • Save and version transformed data

Finished a transformation?

  • Save the processed data and version it
    • Compare models on similar data

Modeling

Create a baseline model

  • Use autoML tools, or a simple heuristic
  • If not – create a simple model such as linear or logistic regression
  • Start from a simple feature space
    • Look for correlated features
      • Remove redundant features with L1 regularization

Design experiments

  • Small-stepped, simple, well-defined experiments
  • Document the “why”
  • Look at metrics, assumptions, limitations, and state
  • Set a research goal
  • Make a hypothesis

Train a model

  • Evaluate the results and analyze
    • Check for failures
    • Visualize results
  • Think about solutions
    • Look at false positives and false negatives – ask why the model failed?
      • Can visualize the mistake
    • Look for noisy features
      • Use L1 regularization
    • Think of a better model for the task
  • Refine the hypothesis and repeat
    • Better results
      • Hitting the “good enough” – move on
      • If not – add more features, more data, more complexity
    • Worst results
      • Debug the model
      • Test another direction

After that

  • Optimize hyperparameters
    • Optuna or other tools
  • Write smoke tests
  • Write unit test-optional

Production

  • When?
    • The model reaches “good enough”
    • Next phases
  • Clean and structure the code
    • Simplify the input and processing
  • Optimize your model for serving
    • optimize performance (if required)
    • optimize for hardware
  • Wrap model
    • as REST API or other serving option
    • Dockerize if required
  • Design monitoring
    • Model performance (evaluation metrics) – user clicked or not
    • Resources tracking – CPU/GPU
    • Connect to automated alerts/reports system
  • Automated pipeline
    • Define retraining policy
    • Execute

Comments

  • No comments yet.
  • Add a comment