NeutoAI Blog

Key Takeaways

WDIS AI-ML Series: Module 2 Lesson 7: Model Training and Model Testing

In the last section, we explored how to identify a set of models that might help us solve a business problem. But just knowing which algorithms exist is not enough. The real challenge—and opportunity—comes in the next stage of the journey: training these models, testing their performance, improving them, and ultimately selecting the one that delivers the most value for your business.

Think of this stage as the playoffs of a sports league. Multiple teams (models) have qualified, but now they must compete head-to-head to see who makes it to the finals and who takes home the trophy.

2.7.1. How can we find a winning model, and how can we improve it?

Now that we have identified a set of models to try as we discussed in our last section, we can move to the next steps, from Step 6 to Step 8 in the Machine learning pipeline , which include

Training the models
Evaluating their performance
Improving them, and finally
Selecting the winning model that best solves our business problem

End-to-end machine learning framework from PRD to Data Preparation to Model Selection and Iteration

‍

This is where machine learning becomes real. The difference between a model that works in theory and one that creates business value comes down to how well we execute this stage.

Step 6: Train the ML Models: First, we will have to train the model, similar to how we train our two-year-olds for a task. Let us understand what training a model means.

Think of training a model like teaching a two-year-old how to identify animals. You show it many examples of cats, dogs, and birds and tell it the correct name. Over time, the child learns to associate the correct labels with the features. A machine learning model works the same way: it learns from historical examples where the “correct answer” is already known.

Splitting the Dataset: Training vs Testing

Training starts with splitting the dataset into two (in some cases, three) parts:

‍

Training Set: Typically 70-80% of the full dataset. This is the portion the model learns from, like the notes and homework you study before an exam.
Test Set: The remaining 20-30%, which is held back and used only after the model has been trained. This is the final exam, the data the model hasn’t seen before, and must generalize to. The actual result is the performance of the model on the test dataset, so we want to ensure that the model does not see this data more than once, simulating the real-world environment of unseen data.
Validation Set (Optional): In some workflows, we also set aside a third chunk known as the Validation Set (often carved out of the training data). This set is used during the training process to fine-tune model settings. In our previous analogy, think of it like doing mock tests and adjusting your study strategy based on feedback.

**Hold-out Method for Training Machine Learning Models**

‍

This process ensures that the model doesn’t just memorize the training data (which would lead to overfitting), but learns general patterns that apply broadly—even to data it hasn’t seen before.

Executive Insight: If a model performs extremely well on training set but poorly on test set, it’s like a sales associate who can ace role-play sessions but fails miserably in front of a real customer.

To illustrate the concept of training, let us discuss a business case. We will revisit the house price prediction example introduced earlier as our working case study.

Step-by-step: Training with a House Price Example

1. Frame the business question. Predict the sale price of a home (in dollars) before it sells, so that as a prospective buyer in the market, we can bid the right price for the house.

2. Define target and features.

Target (y): Final sale price.
Features (X): Square footage, bedrooms, bathrooms, lot size, neighborhood, year built, renovation flag, distance to transit, school ratings, energy score, etc.
Avoid leakage: Be careful to only use information that would be available at the time of prediction. For example, when predicting the sale price before a transaction, do not include details only known after the sale (such as the final appraisal post‑closing). Using such data would give the model an unrealistic advantage and result in misleading performance estimates.

3. Assemble & clean data.Join MLS + tax records; deduplicate homes; fix obvious errors (e.g., 99 bedrooms); handle missing values; standardize units; encode categorical fields (e.g., one-hot for neighborhood)

‍4. Split the data.Training (~70%), Validation (~15%), Test (~15%), or use K-Fold cross-validation. Keep the test set untouched until the very end.‍

5. Establish a baseline.Predict median price per square foot by neighborhood (or a simple linear model). Record MAE/RMSE to know if ML beats a simple rule.

6. Choose a loss function aligned to the cost of errors.

MAE (Mean Absolute Error): robust to outliers; easy to explain in dollars.
RMSE (Root Mean Squared Error): penalizes large mistakes more.

Pick the one that best reflects your business penalty for big mispricing.

Now, with our data prepared, we train the model. This means feeding the training data into the algorithm so it can learn the relationships between inputs (like House Features) and outputs (like House Prices).

The goal is to minimize the difference between the model’s predictions and the actual results—a process guided by a mathematical function called a "loss function" or “Objective function”. We discussed this in more detail here WDIS AI-ML Series: Module 2 Lesson 1: Objective function - AI is nothing but an optimization problem

2.7.2. What Happens During Training?

In the next step, we’ll explore how to measure how well each model has learned and begin the process of selecting a winner.

We configure each model with specific settings or hyperparameters (like learning rate, number of layers, and regularization strength), which greatly influence performance. Throughout this process, we watch for two common pitfalls:

Overfitting: The model memorizes the training data but performs poorly on new (Test) data.
Underfitting: The model fails to learn the patterns in the training data, resulting in poor performance everywhere.

Example: If we’re building a churn model, we would feed historical customer records into the training phase. The model then learns patterns that differentiate customers who stayed from those who left. This trained model can later predict which current customers might be at risk of leaving based on their latest behavior.

We will dedicate a separate chapter on Overfitting and underfitting later. Until then it is enough to know that overfitting means the model has learnt the training data too well but does poorly on testing data while underfitting is when the model has not explained the training data as well as shown in the image below

Step 6: Report the Results of ML Models on Evaluation Metrics:

Evaluate the trained models on validation datasets using relevant metrics such as accuracy, precision, recall, F1 score (classification), or Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) (regression). We discussed this in more detail here WDIS AI-ML Series: Module 2 Lesson 1: Objective function - AI is nothing but an optimization problem
Example: Reporting precision and recall to evaluate fraud detection models, emphasizing scenarios where minimizing false negatives is critical.

Step 7: Improve ML Models to Enhance Performance:

Identify models that can be refined for better performance through techniques such as hyperparameter tuning (Grid Search, Random Search, Bayesian Optimization).
Example: Adjusting hyperparameters like learning rate, tree depth, or number of estimators in an XGBoost model to improve predictive accuracy for customer churn predictions.

Step 8: Identify the Winning Model:

Select the final model based on overall performance, interpretability, computational efficiency, and alignment with business objectives.
Validate its robustness by testing on previously unseen datasets to confirm consistent performance.

Example: Finalizing a customer churn prediction model after verifying its accuracy and reliability across diverse customer segments.

‍

2.7.3. How do we know the model will stay the winner?

Model Drift Definition: Model drift occurs when the predictive performance of a machine learning model deteriorates over time due to changes in the underlying data patterns or distribution. Regular monitoring and addressing model drift ensure sustained model accuracy, reliability, and continued alignment with business objectives.

Types of Drift:
- Concept Drift: Occurs when the relationship between input features and the output variable changes over time.
- Data Drift: Happens when the input data distribution itself changes, affecting model performance.
Detecting and Addressing Drift:
- Implement regular monitoring procedures and automated alerts for detecting drift early.
- Periodically retrain models with the latest data to maintain predictive accuracy.
- Example: Continuously monitoring a fraud detection model and retraining quarterly to adapt to evolving fraud patterns and behaviors.

Understanding and proactively managing model drift help businesses ensure their machine learning models remain effective, trustworthy, and valuable over the long term, safeguarding business decisions and strategic investments.

Closing Thought

Training and testing is where machine learning transitions from experimentation to impact. The best model is not the most complex one—it is the one that performs reliably, aligns with business costs, and continues delivering value over time.

Understanding this stage is what separates organizations that build models from organizations that build durable AI capabilities.

‍

About the author:

Vinay Roy

Fractional AI / ML Strategist | ex-CPO | ex-Nvidia | ex-Apple | UC Berkeley