Case Study

Building a Lightweight AI Recommendation Engine for Real Estate.

By Enzo García.

Recently, we implemented a high-performance recommendation system for a Real Estate client.

In the world of software development, “Artificial Intelligence” is often synonymous with massive neural networks, expensive GPUs, and weeks of model training. But for many business applications, that approach is overkill.

At weKnow Inc., we believe in the principle of Right-Sized Engineering. Recently, we implemented a high-performance recommendation system for a Real Estate client. The goal? To solve the “Cold Start” problem and deliver hyper-personalized property alerts without the overhead of “Heavy AI.”

Here is a deep dive into how we built a Two-Stage Content-Based Filtering System—the same architectural strategy used by Spotify’s Discover Weekly and Netflix—using a lightweight, statistical learning approach.

The Challenge: From Raw Clicks to Probabilistic Vectors.

Our client needed a way to serve emails to users featuring properties they might like, even if they hadn’t explicitly saved a search. We needed to transform raw behavioral data (clicks and views) into a predictive engine.

To achieve this, we moved beyond simple database queries (SQL) and implemented a pipeline that processes implicit behavioral signals to create “User Decision Vectors”.

The architecture is broken down into three algorithmic components:

1. Signal Processing: The "Time Decay" Algorithm.

The first step is transforming a user’s “Clickstream” (history of page visits) into a quantifiable “Interest Score”.

We utilize Implicit Feedback. Unlike a “Like” button (explicit feedback), we infer interest based on frequency and recency. However, raw data is noisy. To clean it, we applied two specific mathematical transformations:

1. Logarithmic Smoothing: We calculate the score using:

$log_{10}(1+Frequency)$5

This prevents outliers—such as a bot or an obsessive user clicking 100 times—from distorting the model. In our model, the difference between 1 and 10 visits is significant, but the difference between 50 and 60 is marginal.

2. Exponential Time Decay: We apply a decay factor where $\Delta t$ is the time since the last visit, using a “Half-Life” ($H$) of 14 days.

The “What Have You Done Lately?” Philosophy

By setting a Half-Life of 14 days, the value of a user’s interaction drops by 50% every two weeks. This ensures the system prioritizes current behavior over historical data. If a user looked for rentals a year ago but is looking for luxury purchases today, the system adapts automatically.

2. User Profiling: Weighted Centroids.

Once we have the scores, we need to understand who the user is. Instead of traditional Collaborative Filtering, which requires massive matrices of User-Item data, we used Feature Weighted Averaging to calculate the “Center of Gravity” of the user’s interests.

This process results in a User Vector.

User Vector ($V_u$).

For example, to determine the user’s Price Target,

($P_{target}$)

We do not use a simple arithmetic average, which would be a statistical error. Instead, we use a Mean Pondered by the Score ($S$):

The Impact: This creates a centroid that shifts dynamically. If a user viewed a $5M mansion a year ago (low Score due to time decay) but viewed ten $200k condos yesterday (high Score), the centroid will firmly settle around $200k. We also use weighted counters to classify users into behavioral clusters: Buyer, Renter, or Hybrid.

3. Retrieval & Ranking: Gaussian Density Functions.

The final stage is the “Matchmaking.” How do we query the database for new properties (Cold Start Items) that match this profile?

Standard SQL queries are binary and rigid (e.g., WHERE price = $200k). Human preference is not rigid. To mimic human cognition, we implemented an OpenSearch Function Score Query using a Gaussian Decay Function.

We rank candidate properties by applying a Bell Curve centered on the user’s $P_{target}$:

This allows for “Fuzzy Ranking”:

  • Target: A property matching the exact price gets a score of 1.020.

  • Deviance: A property 10% more expensive might get a 0.9, while one 50% more expensive drops to 0.121.

The system understands that “close is good enough,” providing a much more organic search experience than rigid database filters.

Why "Lightweight AI"?

At weKnow Inc., we advise clients to choose the right tool for the job. For this project, a Lightweight AI approach offered distinct advantages over “Heavy” Deep Learning.

Feature

Heavy AI (Deep Learning)

Our Lightweight Engine

Training

Requires days of training (model.fit) with millions of data points.

No training required. Calculates the model “on the fly” in milliseconds.

Infrastructure

Needs expensive GPUs and high RAM.

Runs on standard CPUs or Serverless functions

Transparency

“Black Box.” Hard to explain why a recommendation was made.

“White Box.” Fully explainable. We know a house was recommended because the Price Score was 0.9.

Data Needs

Needs massive datasets to avoid errors.

Works with “Cold Start” data. Effective after just ~3 clicks.

Conclusion.

This system is, in essence, an intelligent engine that learns from immediate user interaction to predict future preferences. It generalizes patterns (the Centroid) and predicts affinity for items the user has never seen.

By leveraging Statistical Learning—vectors, weighted averages, and probability functions—we delivered a solution that provides the intelligence of Machine Learning without the infrastructure costs.

Are you looking to build intelligent, scalable platforms?

At weKnow Inc., we provide the specialized engineering talent to turn complex data problems into elegant, efficient solutions.