Instacart Grocery Basket

Customer Segmentation with Python

Population Flow

Data Consistency Checks

Objective: Ensure data integrity by resolving inconsistencies, missing values, and duplicates.

Process:

  • Fixed mixed type variables: Columns with mixed data types have been standardized to a single type.
  • Missing values: Missing values have been identified and handled appropriately, either by imputing values or removing rows/columns as needed.
  • Duplicate removal: Duplicate records were detected and removed to avoid skewing analysis results.

Outcome: A consistent and accurate dataset, free from anomalies.

Data Wrangling & Subsetting

Objective: Clean and subset data to facilitate meaningful analysis.

Process:

  • Data type adjustment and renaming: Identifier variables have been converted to appropriate data types, such as converting numeric IDs to strings for consistency. Columns have been renamed for clarity based on the data dictionary.
  • Value interpretation: A data dictionary was used to determine the meaning of various values to ensure accurate interpretation.
  • Dataframe creation: New dataframes were created based on specific criteria, such as orders placed in a particular month or by specific customer segments.
  • User activity analysis: Examined variable frequencies to answer questions about user activity, such as most frequently ordered items or peak order times.

Outcome: Cleaned and well-organized data ready for further analysis.

Combining & Exporting Data

Objective: Merge multiple dataframes and prepare the data for further analysis.

Process:

  • Merging dataframes: Different data frames were merged using common keys. Merge flag frequencies were analyzed to ensure successful joins and to identify any mismatches.
  • Data export: The merged dataset was exported as a pickle file for efficient storage and retrieval.

Outcome: A comprehensive dataset ready for analysis, stored in an easily accessible format.

Deriving New Variables

Objective: Create new variables to enhance data analysis.

Process:

  • Conditional logic: New columns have been created using if-statements, custom functions, the loc() function, and for-loops. For example, apart from the ones shown below, a column that shows high-value customers based on their total spend.
  • User activity insights: These new variables provided deeper insight into customer behavior and preferences.

Outcome: Expanded dataset with new insightful variables.

Grouping & Aggregating Data

Objective: Aggregate data and create new flags for detailed analysis.

Process:

  • Flag creation: New columns have been added, such as a loyalty flag for repeat customers.
  • Descriptive statistics: The groupby() function was used to create summary columns (e.g. average order size, total spend per customer).

Outcome: Detailed summary statistics and flags that provide a clearer picture of customer behavior and trends.

Data Visualization

Objective: Visualize data to discover patterns and relationships.

Process:

  • Data preparation: Customer data was imported and merged with the existing dataset.
  • Visualization: Various graphs, such as histograms, bar charts, line graphs, and scatterplots, have been created to visualize different variables and their relationships.

Outcome: Visualizations that highlight key patterns and relationships to help interpret data.

Excel Reporting

Objective: Provide a comprehensive report summarizing the analysis and findings.

Process:

  • Customer profiling: New columns and flags have been created to profile customers, such as identifying high-frequency buyers.
  • Order behavior analysis: The ordering behavior of different customer groups was analyzed to identify trends and preferences.
  • Report creation: A detailed report was created summarizing the methodology, analysis, findings and recommendations. This report included insights such as peak shopping times, popular products, and customer retention trends.

Outcome: A well-documented report for Instacart stakeholders that provides actionable insights and recommendations to improve business strategies.