By Enzo García.
Recently, we implemented a high-performance recommendation system for a Real Estate client.
In the world of software development, “Artificial Intelligence” is often synonymous with massive neural networks, expensive GPUs, and weeks of model training. But for many business applications, that approach is overkill.
At weKnow Inc., we believe in the principle of Right-Sized Engineering. Recently, we implemented a high-performance recommendation system for a Real Estate client. The goal? To solve the “Cold Start” problem and deliver hyper-personalized property alerts without the overhead of “Heavy AI.”
Here is a deep dive into how we built a Two-Stage Content-Based Filtering System—the same architectural strategy used by Spotify’s Discover Weekly and Netflix—using a lightweight, statistical learning approach.
Our client needed a way to serve emails to users featuring properties they might like, even if they hadn’t explicitly saved a search. We needed to transform raw behavioral data (clicks and views) into a predictive engine.
To achieve this, we moved beyond simple database queries (SQL) and implemented a pipeline that processes implicit behavioral signals to create “User Decision Vectors”.
The architecture is broken down into three algorithmic components:
The first step is transforming a user’s “Clickstream” (history of page visits) into a quantifiable “Interest Score”.
We utilize Implicit Feedback. Unlike a “Like” button (explicit feedback), we infer interest based on frequency and recency. However, raw data is noisy. To clean it, we applied two specific mathematical transformations:
1. Logarithmic Smoothing: We calculate the score using:
$log_{10}(1+Frequency)$5
This prevents outliers—such as a bot or an obsessive user clicking 100 times—from distorting the model. In our model, the difference between 1 and 10 visits is significant, but the difference between 50 and 60 is marginal.
2. Exponential Time Decay: We apply a decay factor where $\Delta t$ is the time since the last visit, using a “Half-Life” ($H$) of 14 days.
The “What Have You Done Lately?” Philosophy
By setting a Half-Life of 14 days, the value of a user’s interaction drops by 50% every two weeks. This ensures the system prioritizes current behavior over historical data. If a user looked for rentals a year ago but is looking for luxury purchases today, the system adapts automatically.
Once we have the scores, we need to understand who the user is. Instead of traditional Collaborative Filtering, which requires massive matrices of User-Item data, we used Feature Weighted Averaging to calculate the “Center of Gravity” of the user’s interests.
This process results in a User Vector.
User Vector ($V_u$).
For example, to determine the user’s Price Target,
($P_{target}$)
We do not use a simple arithmetic average, which would be a statistical error. Instead, we use a Mean Pondered by the Score ($S$):
The Impact: This creates a centroid that shifts dynamically. If a user viewed a $5M mansion a year ago (low Score due to time decay) but viewed ten $200k condos yesterday (high Score), the centroid will firmly settle around $200k. We also use weighted counters to classify users into behavioral clusters: Buyer, Renter, or Hybrid.
The final stage is the “Matchmaking.” How do we query the database for new properties (Cold Start Items) that match this profile?
Standard SQL queries are binary and rigid (e.g., WHERE price = $200k). Human preference is not rigid. To mimic human cognition, we implemented an OpenSearch Function Score Query using a Gaussian Decay Function.
We rank candidate properties by applying a Bell Curve centered on the user’s $P_{target}$:
This allows for “Fuzzy Ranking”:
The system understands that “close is good enough,” providing a much more organic search experience than rigid database filters.
At weKnow Inc., we advise clients to choose the right tool for the job. For this project, a Lightweight AI approach offered distinct advantages over “Heavy” Deep Learning.
Feature
Heavy AI (Deep Learning)
Our Lightweight Engine
Training
Requires days of training (model.fit) with millions of data points.
No training required. Calculates the model “on the fly” in milliseconds.
Infrastructure
Needs expensive GPUs and high RAM.
Runs on standard CPUs or Serverless functions
Transparency
“Black Box.” Hard to explain why a recommendation was made.
“White Box.” Fully explainable. We know a house was recommended because the Price Score was 0.9.
Data Needs
Needs massive datasets to avoid errors.
Works with “Cold Start” data. Effective after just ~3 clicks.
This system is, in essence, an intelligent engine that learns from immediate user interaction to predict future preferences. It generalizes patterns (the Centroid) and predicts affinity for items the user has never seen.
By leveraging Statistical Learning—vectors, weighted averages, and probability functions—we delivered a solution that provides the intelligence of Machine Learning without the infrastructure costs.
At weKnow Inc., we provide the specialized engineering talent to turn complex data problems into elegant, efficient solutions.