
Topic 1: Data Understanding: Exploratory Data Analysis (EDA)

In the last module Lesson 5.3, we discussed how we can get data collected in a central data repository that we call Data Warehouse. So now the questions arises, how do we use this data that exists in the Data warehouse and/or Data Lake to build Machine Learning (ML) models?
However, before we build models, we need to gain a deeper understanding of our data. The process of gaining this deeper understanding by exploring the data is called Exploratory Data Analysis (EDA).
Exploratory Data Analysis (EDA), an initial phase of any data science project, is a critical step in the data analysis process, used to understand the underlying structure, patterns, and relationships within a dataset before formal modeling or hypothesis testing. It's like detective work, where you delve into your data to understand its characteristics, identify patterns, and uncover potential insights. Some examples of EDA could be looking at summary statistics such as the mean, and median of the data to detect outliers, drawing histograms to see the distribution of data, exploring the relationship between two features by drawing the data on a two-dimensional plot, etc.
The more intimate understanding your data science and leadership team has about data, the higher the ROI of machine learning model will be.
You can read more about this in our article Data Science: Nurturing a data fluent culture that compounds growth.
Topic 2: Model Centric Vs Data Centric Approach to AI
In academic settings, machine learning courses often provide clean, well-organized datasets. Students then focus on finding the best ML model for these datasets. This approach, which emphasizes improving the model to achieve incremental gains, is called the Model-Centric Approach. It's particularly crucial for organizations in competitive environments where technological superiority is a key differentiator. In such cases, investing in cutting-edge models can be vital.

Strategic Benefits:
However, this approach assumes that the data available is of high quality and well-suited to the task at hand. If the data is flawed, no amount of model sophistication will lead to the desired outcomes i.e. garbage-in. garbage-out. Real-world data is messy, unorganized, large, or in an odd format. This is where comes the data-centric approach.
No wonder that according to a 2020 State of Data Science survey by Anaconda, respondents reported that on average 45% of their time is spent getting data ready (collecting and cleansing) before they can use it to develop models and visualizations.

Most organizations can get a much better yield by focusing on improving the data quality, relevance, and robustness of the data used to train AI models. This approach is called The data-centric approach. Data is the most valuable asset an organization possesses, and maximizing its potential can yield significant strategic advantages.

Strategic Benefits:
This approach requires a deep commitment to data governance, quality assurance, and possibly a cultural shift within the organization to prioritize data as a key strategic asset.
We advise taking an Data-First Model-Second Hybrid Approach to deliver incremental value to end customers and internal stakeholders by developing AI models while data quality is being improved iteratively.

In the hybrid approach, the organization builds a culture of data first mindset by focusing on improving the data quality and in parallel using a few techniques (Not exhaustive) like below to deliver value such as:
2.1 Curriculum Learning: Training on easier data set and progressing to more complex data set as we are able to improve the quality of data.
2.2 Confident learning: Focusing on identifying and correcting label errors in training data. It leverages the model’s predictions to estimate the confidence of each label in the dataset and detects when labels may be incorrect or noisy.
Now that we know why Data first mindset is needed, Feature engineering becomes important. But before that let us understand Features.
Topic 3: Features
Features are individual measurable properties or characteristics of the data that are used as inputs to a machine learning model. They represent the aspects of the data that the model will analyze and learn from to make predictions or uncover patterns. Features are crucial for ML models because they represent meaningful attributes that the model can interpret and learn from.
Examples of Features:
Also it is worth noting that features are important for any ML model, be it Supervised Learning, Unsupervised Learning, or Reinforcement Learning.
3.1. Features for Supervised Learning: In supervised learning, features are the input variables (independent variables) used by the model to predict an output variable (dependent variable). The dataset consists of labeled data, meaning each input (feature set) is paired with a corresponding output (label). Features can be numerical, categorical, or even a combination of different types, depending on the problem.
Examples:
Feature Characteristics:
3.2. Features for Unsupervised Learning: In unsupervised learning, features are the input variables used to find patterns, groupings, or structures in the data without any explicit output labels. The dataset is unlabeled, meaning the model does not have predefined outputs to learn from. Instead, it tries to uncover hidden patterns or relationships within the data.
Examples:
Feature Characteristics:
3.3. Features for Reinforcement Learning: In reinforcement learning, features are the inputs representing the state of the environment that the agent observes. These features guide the agent’s actions to maximize some notion of cumulative reward. The agent interacts with an environment, and at each time step, it receives a set of features representing the current state, takes an action, and receives feedback (reward) from the environment.
Examples:
Feature Characteristics:
Each type of learning relies on features to extract relevant information from the data, but the way these features are used and the nature of the input varies depending on the learning paradigm. So let us now talk about how to extract and select features for a machine learning model.
Topic 4: Feature Engineering Pipeline
Feature engineering involves creating new features or transforming existing ones to make the data more suitable for model training. Feature engineering thus transforms raw data into a format that best represents the underlying problem a machine learning algorithm aims to model. It involves manipulating data to create meaningful features, while also addressing inherent complexities and biases within the dataset.
There are typically five steps in the feature engineering pipeline not strictly in the order:

4.1. Feature Extraction: Feature Extraction is a process in machine learning and data science where raw data is transformed into a set of features that better represent the information needed for analysis and modeling. Unlike feature selection, which we will talk in 4.4., which involves choosing a subset of existing features, feature extraction involves creating new features that can capture the underlying structure of the data more effectively.
Some common Techniques for Feature Extraction:
4.1.1. Principal Component Analysis (PCA): PCA is a statistical technique that converts a set of possibly correlated variables into a set of uncorrelated variables called principal components. The first few principal components capture the most variance in the data, effectively reducing the dimensionality while preserving as much information as possible.
Application: PCA is used in image compression, where it reduces the dimensionality of image data while retaining key information.
4.1.2. Linear Discriminant Analysis (LDA): LDA is used to find a linear combination of features that best separate two or more classes of data. It’s commonly used in classification tasks to reduce the number of features while preserving class separability.
Application: LDA is often applied in face recognition, where it reduces the dimensionality of facial images while maximizing the separation between different individuals.
4.1.3. Wavelet Transform: Wavelet transform decomposes data into different frequency components, and each component is analyzed with a resolution matched to its scale. This is particularly useful for data that has different characteristics at different scales, like time-series data or images.
Application: In signal processing, wavelet transform is used for feature extraction in tasks like speech recognition or analyzing medical signals like ECG.
4.1.4. Text Feature Extraction (e.g., TF-IDF, Word Embeddings): Techniques like Term Frequency-Inverse Document Frequency (TF-IDF) and word embeddings (e.g., Word2Vec, GloVe) transform raw text into numerical vectors that capture the importance or semantic meaning of words in the context of a document or corpus. We will discuss this in more detail in later module on Natural Language Processing.
Application: These techniques are crucial in NLP tasks such as document classification, sentiment analysis, and information retrieval.
4.1.5. Convolutional Neural Networks (CNNs): In deep learning, CNNs automatically extract hierarchical features from image data through convolutional layers. These layers detect features like edges, textures, and objects in different levels of abstraction. We will discuss this in more detail in later module on Computer Vision.
Application: CNNs are widely used in computer vision tasks like image classification, object detection, and facial recognition.
4.2. Feature Construction: Feature Construction is the process of creating new features from the existing raw data to improve the performance of a machine learning model. The goal is to derive more informative and relevant features that capture the underlying patterns or relationships in the data more effectively.
Why Feature Construction is Important:
Some common Techniques for Feature Construction are:
4.2.1. Mathematical Transformations: In a dataset with features like height and weight, you might construct a new feature BMI (Body Mass Index) by applying the formula BMI = weight / (height^2). This new feature could be more indicative of health status than height or weight alone.
4.2.2. Combinations of Existing Features: In an e-commerce dataset, you might combine number_of_items_purchased and total_purchase_value to create a new feature average_item_value. This new feature could help understand purchasing behavior more effectively than the other two features.
4.2.3. Polynomial Features: For a regression problem where the relationship between the features and the target is non-linear, you might create polynomial features (e.g., x, x^2, x^3) to allow the model to capture more complex relationships.
4.2.4. Interaction Features: If you have two features, age and income, constructing an interaction feature like age * income could help the model capture the effect of income at different ages on the target variable.
4.2.5. Aggregations and Groupings: In a time-series dataset, you could construct features like monthly_average_sales or yearly_sales_growth to provide temporal insights that a model might use to make more accurate predictions.
4.2.6. Encoding Categorical Variables: In a dataset with categorical features like color, you might construct new binary features using one-hot encoding, where each color becomes a separate feature indicating the presence or absence of that color.
4.2.7. Text and Image Features: In text data, you might construct features like word_count, average_sentence_length, or frequency of specific keywords. In image data, you could extract features like edges, textures, or color histograms.
4.3. Feature Improvement: As the name suggests, Feature Improvement aims to enhance existing features through techniques like imputing missing data values, standardization, and normalization. We use Feature Improvement when:
a) Features we wish to use are unusable by an ML model (e.g., they have missing values).b) Features contain extreme outliers that may negatively impact our ML model's performance.
Improved features can lead to better model accuracy, generalization, and interpretability. Here are some examples of how feature improvement can be implemented:
4.3.1. Handling Missing Values
Original Feature: A dataset with missing values in the Age column [25, 30, NaN, 22, 28]
Improvement: Replace missing values with the median or mean of the Age column, or use more sophisticated methods like k-nearest neighbors (KNN) imputation. Example: [25, 30, 26.25, 22, 28] (mean imputation)
4.3.2. Transforming Skewed Distributions
Original Feature: A highly skewed income variable [30000, 35000, 40000, 150000, 300000], where most people earn a small amount, but a few earn significantly more.
Improvement: Apply a log transformation, [10.31, 10.46, 10.60, 11.92, 12.61] (log transformation), to reduce skewness, making the data more normally distributed, which many models assume.
4.3.3. Encoding Categorical Variables
Original Feature: A categorical variable like Color with values such as ['Red', 'Green', 'Blue'].
Improvement: Encode this variable using one-hot encoding Red: [1, 0, 0], Green: [0, 1, 0], Blue: [0, 0, 1] or label encoding Red: 1, Green: 2, Blue: 3.
4.3.4. Creating Interaction Features
Original Feature: Two features Age and Income that might have an interaction effect on the target variable Age: [25, 30, 35], Income: [30000, 40000, 50000].
Improvement: Create an interaction feature by multiplying the two, which can capture the combined effect example: Age_Income: [750000, 1200000, 1750000] (multiplying Age by Income).
4.3.5. Binning Continuous Variables
Original Feature: A continuous variable like Age [23, 27, 34, 45, 52]
Improvement: Create age groups (bins) like [18-25, 26-35, 26-35, 36-45, 46-55], to reduce noise and capture more meaningful patterns.
4.3.6. Dimensionality Reduction
Original Feature: A dataset with 100 features, some of which may be highly correlated or redundant [Feature1, Feature2, ..., Feature100]
Improvement: Use techniques like Principal Component Analysis (PCA) to reduce the number of features while retaining most of the information [Principal Component1, Principal Component2, ..., Principal Component10]
4.3.7. Scaling and Normalization
Original Feature: Numerical features on different scales, like Height in centimeters and Weight in kilograms Height: [150, 160, 170], Weight: [50, 60, 70]
Improvement: Apply scaling (e.g., Min-Max scaling) or normalization to bring all features onto a similar scale, which helps some algorithms converge faster (Min-Max Scaling): Height: [0.0, 0.5, 1.0], Weight: [0.0, 0.5, 1.0]
4.3.8. Time-Based Features
Original Feature: A timestamp feature like Date ['2023-08-01', '2023-08-02', '2023-08-03']
Improvement: Extract meaningful time-based features such as Day of the Week, Month, Quarter, or Elapsed Time Day of the Week: [Tuesday, Wednesday, Thursday], Month: [8, 8, 8]
4.3.9. Text Feature Engineering: We will discuss in more detail in later modules
Original Feature: A text column containing customer reviews ["Great product!", "Terrible service", "Very satisfied"]
Improvement: Transform text into features using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) Feature1: [0.8, 0.1, 0.5], Feature2: [0.2, 0.9, 0.3], ... etc., word embeddings, or sentiment analysis.
Feature improvement involves refining raw features based on insights gained from EDA or domain knowledge. Each step, from handling missing values to creating interaction terms or scaling data, is aimed at making the features more informative for the model, thereby improving its performance.
4.4. Feature Selection: Feature selection is the process of identifying and choosing a subset of relevant features from a set of features that contribute most significantly to predicting the target variable. Its primary goal is to reduce the number of input variables, thereby simplifying models, mitigating overfitting, and enhancing generalization by eliminating noise or irrelevant data.
Why Feature Selection is Important:
Some common Feature Selection Methods with Examples are given below:
4.4.1. Filter Methods: These methods evaluate the relevance of features by examining their statistical properties. They are independent of any machine learning algorithms.
4.4.1.1. Correlation Coefficient: Calculate the correlation between each feature and the target variable. Select features with a high absolute correlation value.Example**:** In a dataset with features like age, income, and years_of_education, you might find that income has a high correlation with the target variable (e.g., purchase likelihood), while years_of_education has a low correlation.Action: Select income and possibly drop years_of_education.
4.4.1.2. Chi-Square Test (for categorical data): Assess the dependency between categorical features and the target variable.Example: In a dataset with categorical features like gender, region, and purchased_product, you might find that region is not significantly related to the target variable.Action: Drop region from the feature set.
4.4.2. Wrapper Methods: These methods evaluate feature subsets by training and testing a model, iteratively adding or removing features based on model performance.
4.4.2.1. Recursive Feature Elimination (RFE): Recursively build models, starting with all features and removing the least important ones (based on a chosen model) until a specified number of features is reached.Example: In a dataset with 20 features, you might start with all of them and iteratively remove the least significant feature according to the model’s performance, ending up with the top 5 most important features.Action: Retain the most impactful features and discard the rest.
4.4.2.2. Forward Selection: Start with an empty set of features, and iteratively add the feature that improves model performance the most until adding more features doesn’t lead to significant improvement.Example: In a dataset, begin with no features, add feature1, and see the improvement. If adding feature2 further improves performance, keep it; otherwise, stop.Action: Build a model with the best subset of features.
4.4.3. Embedded Methods
These methods perform feature selection during the model training process . Some machine learning algorithms have built-in mechanisms for feature selection very similar to ‘Build as you fly’.
4.4.3.1. Lasso Regression (L1 Regularization): Apply a penalty to the coefficients of the regression model that tends to shrink the less important feature coefficients to zero.Example: In a regression problem with many features, Lasso might reduce the coefficients of irrelevant features to zero, effectively performing feature selection.Action: Keep only the features with non-zero coefficients.
4.4.3.2. Tree-Based Methods (e.g., Random Forest, Gradient Boosting): Use the feature importance scores derived from tree-based models to select relevant features.
Example: Train a Random Forest on a dataset and evaluate the feature importance scores. If feature3 has an importance score close to zero, it might be irrelevant.Action: Discard features with low importance scores.
Practical Example of Feature Selection: Imagine we are working on a dataset to predict customer churn, and we have 10 features:age, income, years_with_company, monthly_spend, contract_type, customer_support_calls, is_active, region, gender, and last_purchase_amount.
This is how a typical feature selection will go:
Filter Method (Correlation): Calculate the correlation of each feature with the target variable (churn).
Suppose we find that age and income have very low correlations with churn.
Action: Remove age and income from the dataset.
Wrapper Method (RFE): Use Recursive Feature Elimination with a logistic regression model to evaluate the remaining features.
Suppose we came to understand that contract_type, customer_support_calls, and monthly_spend are the most predictive features.
Action: Select these three features for the final model.
Embedded Method (Random Forest): Train a Random Forest model and obtain feature importance scores.
Find that last_purchase_amount and years_with_company have very low importance.
Action: Exclude these low-importance features from the model.
Summary: Feature selection is a crucial step in building efficient and effective machine learning models. By applying methods such as filter, wrapper, and embedded techniques, you can identify the most relevant features, reduce noise, and improve model performance. This process helps streamline your model, making it more accurate and interpretable.
4.5. Feature Learning: Feature Learning is the process by which a machine learning model automatically discovers the most relevant features from raw data during the training process. Unlike manual feature engineering, where data scientists or engineers explicitly create features, feature learning allows the model to identify and learn the best representations of the data that will improve its performance on a given task.
Why Feature Learning is Important:
‘Automatic Feature Discovery’ - That sounds quite similar to what we discussed about Deep learning in Module 1. A common question that comes up from my students in AI class is, "Why do we have to learn feature engineering? I thought you said Deep learning promises to eliminate the need for it?"
They have a point. Deep learning does promise to automate feature creation, which is why Deep Learning is sometimes also called Feature learning.
While algorithms can automatically learn and extract many features, we're not yet at a stage where all feature engineering can be automated. While feature learning, particularly through deep learning, has automated the discovery of relevant features, there are still several important reasons why manual feature engineering remains valuable and sometimes necessary:
4.5.1. Domain Expertise: Manual feature engineering allows experts to inject domain-specific knowledge into the model. This can be crucial in fields like healthcare, finance, or engineering, where understanding the intricacies of the data can lead to more meaningful features that are directly relevant to the problem at hand.
4.5.2. Interpretability: Models based on manually engineered features are often easier to interpret. In regulated industries, it is important to explain how a model arrives at its decisions. Manually crafted features can be more transparent and interpretable, helping to satisfy these requirements.
4.5.3. Resource Efficiency:
Computational Resources: Deep learning models, particularly those used for feature learning, can be computationally expensive and require significant amounts of data to train effectively. In situations where computational resources are limited, or data is scarce, manual feature engineering can provide good results with simpler models that are faster and cheaper to train.
Small Datasets: When the available dataset is small, deep learning might overfit or fail to generalize well. In such cases, manually engineered features can help in creating robust models that perform better on limited data.
4.5.4. Enhanced Model Performance:
Augmenting Deep Learning: Even in deep learning pipelines, manual feature engineering can be used to create features that are difficult for a neural network to learn on its own. For example, calculating domain-specific ratios, aggregations, or time-based features can provide additional signals that enhance the model’s performance.
Hybrid Approaches: In many real-world applications, a hybrid approach combining manual feature engineering with feature learning can yield better results. Engineers might first create a set of well-understood, meaningful features and then allow a deep learning model to learn additional complex features from the raw data.
4.5.5. Generalization Across Different Tasks:
Transferability: Manually engineered features can sometimes be more easily transferable across different tasks or models. For example, a feature engineered to capture seasonality in sales data might be useful across various models and use cases, regardless of the specific deep learning architecture being employed.
Consistency Across Models: Manually engineered features provide consistency across different models, allowing easier comparison and combination of different algorithms (e.g., using ensemble methods). This consistency can be lost when relying solely on deep learning, where features are learned in a way that is specific to the particular architecture and training process.
4.5.6. Data Types Not Suited for Deep Learning:
Structured Data: Deep learning excels at unstructured data like images, text, and audio. However, for structured data (e.g., tabular data in spreadsheets or databases), traditional machine learning methods often perform well with carefully engineered features, sometimes even outperforming deep learning.
Sparse Data: In cases where data is sparse or contains a lot of missing values, manually engineered features can help make the data more usable for a machine learning model, whereas deep learning might struggle.
4.5.7. Practical and Business Constraints:
Time to Market: Manual feature engineering can sometimes lead to faster deployment of models, especially when there is a pressing business need. Waiting for a deep learning model to be trained, tuned, and validated might not be feasible in time-sensitive situations.
Regulatory and Compliance Requirements: In some industries, regulations might require specific features to be used or require that the decision-making process be fully understandable. Manually engineered features can be more easily documented and justified in these contexts.
While deep learning and feature learning have greatly reduced the need for manual feature engineering in many applications, there are still significant scenarios where manual feature engineering is valuable or even necessary. Domain expertise, interpretability, resource efficiency, the ability to handle small or structured datasets, and practical business considerations all contribute to the continued relevance of manual feature engineering in the broader landscape of machine learning. Combining the strengths of both approaches—manual feature engineering and feature learning—often results in the best outcomes.
What are some types of Feature Learning:
Unsupervised Feature Learning: Objective is to Learn features from the data without using labels.Techniques: Autoencoders, Principal Component Analysis (PCA), and clustering methods like k-means are examples of unsupervised feature learning. These methods learn representations by finding patterns, compressing data, or grouping similar data points together.
Example: An autoencoder might learn to compress and reconstruct images, where the compressed representation (latent space) acts as learned features that capture essential aspects of the images.
Supervised Feature Learning: Objective is to Learn features that are directly useful for a predictive task with labeled data.
Techniques: Deep neural networks, such as Convolutional Neural Networks (CNNs) for images and Recurrent Neural Networks (RNNs) for sequences, are examples of supervised feature learning. These models learn hierarchical features that are optimized for the task, such as image classification or sentiment analysis.
Example: In a CNN, the lower layers might learn to detect edges and textures in images, while the higher layers combine these to recognize more complex structures like faces or objects.
Semi-Supervised and Self-Supervised Feature Learning: Objective is to Learn features from a mixture of labeled and unlabeled data or through self-generated labels from the data itself.
Techniques: Techniques like contrastive learning, generative adversarial networks (GANs), and self-supervised learning methods fall into this category. They leverage large amounts of unlabeled data to learn useful features that can then be fine-tuned with labeled data.
Example: In self-supervised learning, a model might be trained to predict missing parts of an image or the next word in a sentence. The learned features from this task can then be fine-tuned for specific downstream tasks like image classification or language translation.
Examples of Feature Learning through Applications:
Image Recognition (CNNs): A CNN learns features such as edges, textures, and shapes in the early layers, which are then combined into more abstract representations like eyes or wheels in deeper layers. Finally, the model combines these high-level features to recognize entire objects like cars or faces.The model automatically discovers and learns the most relevant features needed to classify images accurately.
Natural Language Processing (Word Embeddings): Word embedding techniques like Word2Vec or BERT learn dense vector representations of words based on their context in large text corpora. These vectors capture semantic meanings and relationships between words, such as “king” being related to “queen” or “Paris” being related to “France.”
The learned word embeddings can be used as features for various NLP tasks, such as sentiment analysis, machine translation, or text classification.
Audio Processing (Spectrograms in Deep Learning): In speech recognition, raw audio data is converted into spectrograms, which are then fed into deep learning models. The model learns features that capture important patterns in the audio, like phonemes or syllables, which are used to transcribe speech.
The model learns to extract features that are crucial for understanding and processing spoken language.
Feature engineering remains a cornerstone of effective machine learning, bridging the gap between raw data and powerful predictive models. While advancements in deep learning have introduced automated feature learning, the value of manual feature engineering persists, particularly in scenarios requiring domain expertise, interpretability, and resource efficiency. By thoughtfully crafting and selecting features, data scientists can significantly enhance model performance, ensuring that even simple models can yield robust and actionable insights. Ultimately, the art and science of feature engineering continue to play a vital role in achieving superior results, whether as a standalone practice or in tandem with modern feature learning techniques.
As a photographer, it’s important to get the visuals right while establishing your online presence. Having a unique and professional portfolio will make you stand out to potential clients. The only problem? Most website builders out there offer cookie-cutter options — making lots of portfolios look the same.
That’s where a platform like Webflow comes to play. With Webflow you can either design and build a website from the ground up (without writing code) or start with a template that you can customize every aspect of. From unique animations and interactions to web app-like features, you have the opportunity to make your photography portfolio site stand out from the rest.
So, we put together a few photography portfolio websites that you can use yourself — whether you want to keep them the way they are or completely customize them to your liking.
Here are 12 photography portfolio templates you can use with Webflow to create your own personal platform for showing off your work.

Subscribe to our newsletter to receive our latest blogs, recommended digital courses, and more to unlock growth Mindset

Topic 1: Data Understanding: Exploratory Data Analysis (EDA)

In the last module Lesson 5.3, we discussed how we can get data collected in a central data repository that we call Data Warehouse. So now the questions arises, how do we use this data that exists in the Data warehouse and/or Data Lake to build Machine Learning (ML) models?
However, before we build models, we need to gain a deeper understanding of our data. The process of gaining this deeper understanding by exploring the data is called Exploratory Data Analysis (EDA).
Exploratory Data Analysis (EDA), an initial phase of any data science project, is a critical step in the data analysis process, used to understand the underlying structure, patterns, and relationships within a dataset before formal modeling or hypothesis testing. It's like detective work, where you delve into your data to understand its characteristics, identify patterns, and uncover potential insights. Some examples of EDA could be looking at summary statistics such as the mean, and median of the data to detect outliers, drawing histograms to see the distribution of data, exploring the relationship between two features by drawing the data on a two-dimensional plot, etc.
The more intimate understanding your data science and leadership team has about data, the higher the ROI of machine learning model will be.
You can read more about this in our article Data Science: Nurturing a data fluent culture that compounds growth.
Topic 2: Model Centric Vs Data Centric Approach to AI
In academic settings, machine learning courses often provide clean, well-organized datasets. Students then focus on finding the best ML model for these datasets. This approach, which emphasizes improving the model to achieve incremental gains, is called the Model-Centric Approach. It's particularly crucial for organizations in competitive environments where technological superiority is a key differentiator. In such cases, investing in cutting-edge models can be vital.

Strategic Benefits:
However, this approach assumes that the data available is of high quality and well-suited to the task at hand. If the data is flawed, no amount of model sophistication will lead to the desired outcomes i.e. garbage-in. garbage-out. Real-world data is messy, unorganized, large, or in an odd format. This is where comes the data-centric approach.
No wonder that according to a 2020 State of Data Science survey by Anaconda, respondents reported that on average 45% of their time is spent getting data ready (collecting and cleansing) before they can use it to develop models and visualizations.

Most organizations can get a much better yield by focusing on improving the data quality, relevance, and robustness of the data used to train AI models. This approach is called The data-centric approach. Data is the most valuable asset an organization possesses, and maximizing its potential can yield significant strategic advantages.

Strategic Benefits:
This approach requires a deep commitment to data governance, quality assurance, and possibly a cultural shift within the organization to prioritize data as a key strategic asset.
We advise taking an Data-First Model-Second Hybrid Approach to deliver incremental value to end customers and internal stakeholders by developing AI models while data quality is being improved iteratively.

In the hybrid approach, the organization builds a culture of data first mindset by focusing on improving the data quality and in parallel using a few techniques (Not exhaustive) like below to deliver value such as:
2.1 Curriculum Learning: Training on easier data set and progressing to more complex data set as we are able to improve the quality of data.
2.2 Confident learning: Focusing on identifying and correcting label errors in training data. It leverages the model’s predictions to estimate the confidence of each label in the dataset and detects when labels may be incorrect or noisy.
Now that we know why Data first mindset is needed, Feature engineering becomes important. But before that let us understand Features.
Topic 3: Features
Features are individual measurable properties or characteristics of the data that are used as inputs to a machine learning model. They represent the aspects of the data that the model will analyze and learn from to make predictions or uncover patterns. Features are crucial for ML models because they represent meaningful attributes that the model can interpret and learn from.
Examples of Features:
Also it is worth noting that features are important for any ML model, be it Supervised Learning, Unsupervised Learning, or Reinforcement Learning.
3.1. Features for Supervised Learning: In supervised learning, features are the input variables (independent variables) used by the model to predict an output variable (dependent variable). The dataset consists of labeled data, meaning each input (feature set) is paired with a corresponding output (label). Features can be numerical, categorical, or even a combination of different types, depending on the problem.
Examples:
Feature Characteristics:
3.2. Features for Unsupervised Learning: In unsupervised learning, features are the input variables used to find patterns, groupings, or structures in the data without any explicit output labels. The dataset is unlabeled, meaning the model does not have predefined outputs to learn from. Instead, it tries to uncover hidden patterns or relationships within the data.
Examples:
Feature Characteristics:
3.3. Features for Reinforcement Learning: In reinforcement learning, features are the inputs representing the state of the environment that the agent observes. These features guide the agent’s actions to maximize some notion of cumulative reward. The agent interacts with an environment, and at each time step, it receives a set of features representing the current state, takes an action, and receives feedback (reward) from the environment.
Examples:
Feature Characteristics:
Each type of learning relies on features to extract relevant information from the data, but the way these features are used and the nature of the input varies depending on the learning paradigm. So let us now talk about how to extract and select features for a machine learning model.
Topic 4: Feature Engineering Pipeline
Feature engineering involves creating new features or transforming existing ones to make the data more suitable for model training. Feature engineering thus transforms raw data into a format that best represents the underlying problem a machine learning algorithm aims to model. It involves manipulating data to create meaningful features, while also addressing inherent complexities and biases within the dataset.
There are typically five steps in the feature engineering pipeline not strictly in the order:

4.1. Feature Extraction: Feature Extraction is a process in machine learning and data science where raw data is transformed into a set of features that better represent the information needed for analysis and modeling. Unlike feature selection, which we will talk in 4.4., which involves choosing a subset of existing features, feature extraction involves creating new features that can capture the underlying structure of the data more effectively.
Some common Techniques for Feature Extraction:
4.1.1. Principal Component Analysis (PCA): PCA is a statistical technique that converts a set of possibly correlated variables into a set of uncorrelated variables called principal components. The first few principal components capture the most variance in the data, effectively reducing the dimensionality while preserving as much information as possible.
Application: PCA is used in image compression, where it reduces the dimensionality of image data while retaining key information.
4.1.2. Linear Discriminant Analysis (LDA): LDA is used to find a linear combination of features that best separate two or more classes of data. It’s commonly used in classification tasks to reduce the number of features while preserving class separability.
Application: LDA is often applied in face recognition, where it reduces the dimensionality of facial images while maximizing the separation between different individuals.
4.1.3. Wavelet Transform: Wavelet transform decomposes data into different frequency components, and each component is analyzed with a resolution matched to its scale. This is particularly useful for data that has different characteristics at different scales, like time-series data or images.
Application: In signal processing, wavelet transform is used for feature extraction in tasks like speech recognition or analyzing medical signals like ECG.
4.1.4. Text Feature Extraction (e.g., TF-IDF, Word Embeddings): Techniques like Term Frequency-Inverse Document Frequency (TF-IDF) and word embeddings (e.g., Word2Vec, GloVe) transform raw text into numerical vectors that capture the importance or semantic meaning of words in the context of a document or corpus. We will discuss this in more detail in later module on Natural Language Processing.
Application: These techniques are crucial in NLP tasks such as document classification, sentiment analysis, and information retrieval.
4.1.5. Convolutional Neural Networks (CNNs): In deep learning, CNNs automatically extract hierarchical features from image data through convolutional layers. These layers detect features like edges, textures, and objects in different levels of abstraction. We will discuss this in more detail in later module on Computer Vision.
Application: CNNs are widely used in computer vision tasks like image classification, object detection, and facial recognition.
4.2. Feature Construction: Feature Construction is the process of creating new features from the existing raw data to improve the performance of a machine learning model. The goal is to derive more informative and relevant features that capture the underlying patterns or relationships in the data more effectively.
Why Feature Construction is Important:
Some common Techniques for Feature Construction are:
4.2.1. Mathematical Transformations: In a dataset with features like height and weight, you might construct a new feature BMI (Body Mass Index) by applying the formula BMI = weight / (height^2). This new feature could be more indicative of health status than height or weight alone.
4.2.2. Combinations of Existing Features: In an e-commerce dataset, you might combine number_of_items_purchased and total_purchase_value to create a new feature average_item_value. This new feature could help understand purchasing behavior more effectively than the other two features.
4.2.3. Polynomial Features: For a regression problem where the relationship between the features and the target is non-linear, you might create polynomial features (e.g., x, x^2, x^3) to allow the model to capture more complex relationships.
4.2.4. Interaction Features: If you have two features, age and income, constructing an interaction feature like age * income could help the model capture the effect of income at different ages on the target variable.
4.2.5. Aggregations and Groupings: In a time-series dataset, you could construct features like monthly_average_sales or yearly_sales_growth to provide temporal insights that a model might use to make more accurate predictions.
4.2.6. Encoding Categorical Variables: In a dataset with categorical features like color, you might construct new binary features using one-hot encoding, where each color becomes a separate feature indicating the presence or absence of that color.
4.2.7. Text and Image Features: In text data, you might construct features like word_count, average_sentence_length, or frequency of specific keywords. In image data, you could extract features like edges, textures, or color histograms.
4.3. Feature Improvement: As the name suggests, Feature Improvement aims to enhance existing features through techniques like imputing missing data values, standardization, and normalization. We use Feature Improvement when:
a) Features we wish to use are unusable by an ML model (e.g., they have missing values).b) Features contain extreme outliers that may negatively impact our ML model's performance.
Improved features can lead to better model accuracy, generalization, and interpretability. Here are some examples of how feature improvement can be implemented:
4.3.1. Handling Missing Values
Original Feature: A dataset with missing values in the Age column [25, 30, NaN, 22, 28]
Improvement: Replace missing values with the median or mean of the Age column, or use more sophisticated methods like k-nearest neighbors (KNN) imputation. Example: [25, 30, 26.25, 22, 28] (mean imputation)
4.3.2. Transforming Skewed Distributions
Original Feature: A highly skewed income variable [30000, 35000, 40000, 150000, 300000], where most people earn a small amount, but a few earn significantly more.
Improvement: Apply a log transformation, [10.31, 10.46, 10.60, 11.92, 12.61] (log transformation), to reduce skewness, making the data more normally distributed, which many models assume.
4.3.3. Encoding Categorical Variables
Original Feature: A categorical variable like Color with values such as ['Red', 'Green', 'Blue'].
Improvement: Encode this variable using one-hot encoding Red: [1, 0, 0], Green: [0, 1, 0], Blue: [0, 0, 1] or label encoding Red: 1, Green: 2, Blue: 3.
4.3.4. Creating Interaction Features
Original Feature: Two features Age and Income that might have an interaction effect on the target variable Age: [25, 30, 35], Income: [30000, 40000, 50000].
Improvement: Create an interaction feature by multiplying the two, which can capture the combined effect example: Age_Income: [750000, 1200000, 1750000] (multiplying Age by Income).
4.3.5. Binning Continuous Variables
Original Feature: A continuous variable like Age [23, 27, 34, 45, 52]
Improvement: Create age groups (bins) like [18-25, 26-35, 26-35, 36-45, 46-55], to reduce noise and capture more meaningful patterns.
4.3.6. Dimensionality Reduction
Original Feature: A dataset with 100 features, some of which may be highly correlated or redundant [Feature1, Feature2, ..., Feature100]
Improvement: Use techniques like Principal Component Analysis (PCA) to reduce the number of features while retaining most of the information [Principal Component1, Principal Component2, ..., Principal Component10]
4.3.7. Scaling and Normalization
Original Feature: Numerical features on different scales, like Height in centimeters and Weight in kilograms Height: [150, 160, 170], Weight: [50, 60, 70]
Improvement: Apply scaling (e.g., Min-Max scaling) or normalization to bring all features onto a similar scale, which helps some algorithms converge faster (Min-Max Scaling): Height: [0.0, 0.5, 1.0], Weight: [0.0, 0.5, 1.0]
4.3.8. Time-Based Features
Original Feature: A timestamp feature like Date ['2023-08-01', '2023-08-02', '2023-08-03']
Improvement: Extract meaningful time-based features such as Day of the Week, Month, Quarter, or Elapsed Time Day of the Week: [Tuesday, Wednesday, Thursday], Month: [8, 8, 8]
4.3.9. Text Feature Engineering: We will discuss in more detail in later modules
Original Feature: A text column containing customer reviews ["Great product!", "Terrible service", "Very satisfied"]
Improvement: Transform text into features using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) Feature1: [0.8, 0.1, 0.5], Feature2: [0.2, 0.9, 0.3], ... etc., word embeddings, or sentiment analysis.
Feature improvement involves refining raw features based on insights gained from EDA or domain knowledge. Each step, from handling missing values to creating interaction terms or scaling data, is aimed at making the features more informative for the model, thereby improving its performance.
4.4. Feature Selection: Feature selection is the process of identifying and choosing a subset of relevant features from a set of features that contribute most significantly to predicting the target variable. Its primary goal is to reduce the number of input variables, thereby simplifying models, mitigating overfitting, and enhancing generalization by eliminating noise or irrelevant data.
Why Feature Selection is Important:
Some common Feature Selection Methods with Examples are given below:
4.4.1. Filter Methods: These methods evaluate the relevance of features by examining their statistical properties. They are independent of any machine learning algorithms.
4.4.1.1. Correlation Coefficient: Calculate the correlation between each feature and the target variable. Select features with a high absolute correlation value.Example**:** In a dataset with features like age, income, and years_of_education, you might find that income has a high correlation with the target variable (e.g., purchase likelihood), while years_of_education has a low correlation.Action: Select income and possibly drop years_of_education.
4.4.1.2. Chi-Square Test (for categorical data): Assess the dependency between categorical features and the target variable.Example: In a dataset with categorical features like gender, region, and purchased_product, you might find that region is not significantly related to the target variable.Action: Drop region from the feature set.
4.4.2. Wrapper Methods: These methods evaluate feature subsets by training and testing a model, iteratively adding or removing features based on model performance.
4.4.2.1. Recursive Feature Elimination (RFE): Recursively build models, starting with all features and removing the least important ones (based on a chosen model) until a specified number of features is reached.Example: In a dataset with 20 features, you might start with all of them and iteratively remove the least significant feature according to the model’s performance, ending up with the top 5 most important features.Action: Retain the most impactful features and discard the rest.
4.4.2.2. Forward Selection: Start with an empty set of features, and iteratively add the feature that improves model performance the most until adding more features doesn’t lead to significant improvement.Example: In a dataset, begin with no features, add feature1, and see the improvement. If adding feature2 further improves performance, keep it; otherwise, stop.Action: Build a model with the best subset of features.
4.4.3. Embedded Methods
These methods perform feature selection during the model training process . Some machine learning algorithms have built-in mechanisms for feature selection very similar to ‘Build as you fly’.
4.4.3.1. Lasso Regression (L1 Regularization): Apply a penalty to the coefficients of the regression model that tends to shrink the less important feature coefficients to zero.Example: In a regression problem with many features, Lasso might reduce the coefficients of irrelevant features to zero, effectively performing feature selection.Action: Keep only the features with non-zero coefficients.
4.4.3.2. Tree-Based Methods (e.g., Random Forest, Gradient Boosting): Use the feature importance scores derived from tree-based models to select relevant features.
Example: Train a Random Forest on a dataset and evaluate the feature importance scores. If feature3 has an importance score close to zero, it might be irrelevant.Action: Discard features with low importance scores.
Practical Example of Feature Selection: Imagine we are working on a dataset to predict customer churn, and we have 10 features:age, income, years_with_company, monthly_spend, contract_type, customer_support_calls, is_active, region, gender, and last_purchase_amount.
This is how a typical feature selection will go:
Filter Method (Correlation): Calculate the correlation of each feature with the target variable (churn).
Suppose we find that age and income have very low correlations with churn.
Action: Remove age and income from the dataset.
Wrapper Method (RFE): Use Recursive Feature Elimination with a logistic regression model to evaluate the remaining features.
Suppose we came to understand that contract_type, customer_support_calls, and monthly_spend are the most predictive features.
Action: Select these three features for the final model.
Embedded Method (Random Forest): Train a Random Forest model and obtain feature importance scores.
Find that last_purchase_amount and years_with_company have very low importance.
Action: Exclude these low-importance features from the model.
Summary: Feature selection is a crucial step in building efficient and effective machine learning models. By applying methods such as filter, wrapper, and embedded techniques, you can identify the most relevant features, reduce noise, and improve model performance. This process helps streamline your model, making it more accurate and interpretable.
4.5. Feature Learning: Feature Learning is the process by which a machine learning model automatically discovers the most relevant features from raw data during the training process. Unlike manual feature engineering, where data scientists or engineers explicitly create features, feature learning allows the model to identify and learn the best representations of the data that will improve its performance on a given task.
Why Feature Learning is Important:
‘Automatic Feature Discovery’ - That sounds quite similar to what we discussed about Deep learning in Module 1. A common question that comes up from my students in AI class is, "Why do we have to learn feature engineering? I thought you said Deep learning promises to eliminate the need for it?"
They have a point. Deep learning does promise to automate feature creation, which is why Deep Learning is sometimes also called Feature learning.
While algorithms can automatically learn and extract many features, we're not yet at a stage where all feature engineering can be automated. While feature learning, particularly through deep learning, has automated the discovery of relevant features, there are still several important reasons why manual feature engineering remains valuable and sometimes necessary:
4.5.1. Domain Expertise: Manual feature engineering allows experts to inject domain-specific knowledge into the model. This can be crucial in fields like healthcare, finance, or engineering, where understanding the intricacies of the data can lead to more meaningful features that are directly relevant to the problem at hand.
4.5.2. Interpretability: Models based on manually engineered features are often easier to interpret. In regulated industries, it is important to explain how a model arrives at its decisions. Manually crafted features can be more transparent and interpretable, helping to satisfy these requirements.
4.5.3. Resource Efficiency:
Computational Resources: Deep learning models, particularly those used for feature learning, can be computationally expensive and require significant amounts of data to train effectively. In situations where computational resources are limited, or data is scarce, manual feature engineering can provide good results with simpler models that are faster and cheaper to train.
Small Datasets: When the available dataset is small, deep learning might overfit or fail to generalize well. In such cases, manually engineered features can help in creating robust models that perform better on limited data.
4.5.4. Enhanced Model Performance:
Augmenting Deep Learning: Even in deep learning pipelines, manual feature engineering can be used to create features that are difficult for a neural network to learn on its own. For example, calculating domain-specific ratios, aggregations, or time-based features can provide additional signals that enhance the model’s performance.
Hybrid Approaches: In many real-world applications, a hybrid approach combining manual feature engineering with feature learning can yield better results. Engineers might first create a set of well-understood, meaningful features and then allow a deep learning model to learn additional complex features from the raw data.
4.5.5. Generalization Across Different Tasks:
Transferability: Manually engineered features can sometimes be more easily transferable across different tasks or models. For example, a feature engineered to capture seasonality in sales data might be useful across various models and use cases, regardless of the specific deep learning architecture being employed.
Consistency Across Models: Manually engineered features provide consistency across different models, allowing easier comparison and combination of different algorithms (e.g., using ensemble methods). This consistency can be lost when relying solely on deep learning, where features are learned in a way that is specific to the particular architecture and training process.
4.5.6. Data Types Not Suited for Deep Learning:
Structured Data: Deep learning excels at unstructured data like images, text, and audio. However, for structured data (e.g., tabular data in spreadsheets or databases), traditional machine learning methods often perform well with carefully engineered features, sometimes even outperforming deep learning.
Sparse Data: In cases where data is sparse or contains a lot of missing values, manually engineered features can help make the data more usable for a machine learning model, whereas deep learning might struggle.
4.5.7. Practical and Business Constraints:
Time to Market: Manual feature engineering can sometimes lead to faster deployment of models, especially when there is a pressing business need. Waiting for a deep learning model to be trained, tuned, and validated might not be feasible in time-sensitive situations.
Regulatory and Compliance Requirements: In some industries, regulations might require specific features to be used or require that the decision-making process be fully understandable. Manually engineered features can be more easily documented and justified in these contexts.
While deep learning and feature learning have greatly reduced the need for manual feature engineering in many applications, there are still significant scenarios where manual feature engineering is valuable or even necessary. Domain expertise, interpretability, resource efficiency, the ability to handle small or structured datasets, and practical business considerations all contribute to the continued relevance of manual feature engineering in the broader landscape of machine learning. Combining the strengths of both approaches—manual feature engineering and feature learning—often results in the best outcomes.
What are some types of Feature Learning:
Unsupervised Feature Learning: Objective is to Learn features from the data without using labels.Techniques: Autoencoders, Principal Component Analysis (PCA), and clustering methods like k-means are examples of unsupervised feature learning. These methods learn representations by finding patterns, compressing data, or grouping similar data points together.
Example: An autoencoder might learn to compress and reconstruct images, where the compressed representation (latent space) acts as learned features that capture essential aspects of the images.
Supervised Feature Learning: Objective is to Learn features that are directly useful for a predictive task with labeled data.
Techniques: Deep neural networks, such as Convolutional Neural Networks (CNNs) for images and Recurrent Neural Networks (RNNs) for sequences, are examples of supervised feature learning. These models learn hierarchical features that are optimized for the task, such as image classification or sentiment analysis.
Example: In a CNN, the lower layers might learn to detect edges and textures in images, while the higher layers combine these to recognize more complex structures like faces or objects.
Semi-Supervised and Self-Supervised Feature Learning: Objective is to Learn features from a mixture of labeled and unlabeled data or through self-generated labels from the data itself.
Techniques: Techniques like contrastive learning, generative adversarial networks (GANs), and self-supervised learning methods fall into this category. They leverage large amounts of unlabeled data to learn useful features that can then be fine-tuned with labeled data.
Example: In self-supervised learning, a model might be trained to predict missing parts of an image or the next word in a sentence. The learned features from this task can then be fine-tuned for specific downstream tasks like image classification or language translation.
Examples of Feature Learning through Applications:
Image Recognition (CNNs): A CNN learns features such as edges, textures, and shapes in the early layers, which are then combined into more abstract representations like eyes or wheels in deeper layers. Finally, the model combines these high-level features to recognize entire objects like cars or faces.The model automatically discovers and learns the most relevant features needed to classify images accurately.
Natural Language Processing (Word Embeddings): Word embedding techniques like Word2Vec or BERT learn dense vector representations of words based on their context in large text corpora. These vectors capture semantic meanings and relationships between words, such as “king” being related to “queen” or “Paris” being related to “France.”
The learned word embeddings can be used as features for various NLP tasks, such as sentiment analysis, machine translation, or text classification.
Audio Processing (Spectrograms in Deep Learning): In speech recognition, raw audio data is converted into spectrograms, which are then fed into deep learning models. The model learns features that capture important patterns in the audio, like phonemes or syllables, which are used to transcribe speech.
The model learns to extract features that are crucial for understanding and processing spoken language.
Feature engineering remains a cornerstone of effective machine learning, bridging the gap between raw data and powerful predictive models. While advancements in deep learning have introduced automated feature learning, the value of manual feature engineering persists, particularly in scenarios requiring domain expertise, interpretability, and resource efficiency. By thoughtfully crafting and selecting features, data scientists can significantly enhance model performance, ensuring that even simple models can yield robust and actionable insights. Ultimately, the art and science of feature engineering continue to play a vital role in achieving superior results, whether as a standalone practice or in tandem with modern feature learning techniques.
