Insight Consulting - Scalable Recommendation Engine for Semiconductor Manufacturer

As a data engineer on a long-term consulting engagement with a global semiconductor manufacturer, I contributed to developing a data platform supporting a recommendation engine designed to improve component compatibility. The initiative aimed to streamline product configuration by guiding users toward optimal part combinations based on historical usage patterns and enriched metadata.

Architecture Overview

The diagram below illustrates the end-to-end data architecture and workflow for the recommendation engine, showing how raw data sources are processed through ETL pipelines, transformed into normalized data structures, and fed into the machine learning model to generate actionable recommendations.

Semiconductor Recommendation Engine Architecture

Project Objective

The core use case was to assist users configuring semiconductor products by suggesting compatible components based on a selected base component. A machine learning model used similarity metrics to evaluate potential pairings. For example, selecting a primary structural or logic component (the "base component") would trigger recommendations for complementary parts.

My Contributions

Foundational Data Pipelines

I developed ingestion and transformation pipelines to consolidate raw data from various internal sources. This included:

Master tables cataloging components with key attributes and hierarchical relationships
Historical configuration data informing the recommendation model
Deduplication and standardization of product metadata within a Delta Lake architecture on Azure Databricks

Data Modeling & Business Output

I designed downstream tables to organize the model's recommendations for business use. Using SQL window functions, I:

Ranked top compatible parts for each base component by model-generated similarity score
Translated technical similarity scores into user-friendly rankings to improve interpretability for sales and product teams

Iterative Feature Expansion

As the model evolved, we expanded its feature set to improve accuracy and context:

Parametric Data Integration: Ingested specification data scraped from publicly available product catalogs, including key engineering attributes such as voltage tolerance, package type, and thermal characteristics
Behavioral Feedback Loop: Integrated performance data tracking which recommendations led to successful configurations and conversions. This created a positive feedback mechanism where model outputs became inputs to further refine recommendations

Data Quality and Ownership

This engagement marked my first full ownership of an end-to-end ETL pipeline. Over time, I independently handled:

Adding new data sources to production pipelines
Performing regular data quality checks such as row counts, null tracking, and schema validation
Organizing raw model output into actionable datasets for dashboarding and business consumption

Outcome

The solution enhanced the customer configuration experience on the manufacturer's digital platform by enabling faster, more accurate builds. Technically, it laid the groundwork for a continuously improving system where user behavior contributed to strengthening recommendation accuracy. This project significantly deepened my understanding of applied machine learning pipelines and the translation of model output into scalable, business-driven data architecture.