Facial Aware Personalized Advertising (FAPA)

AI/ML Targeted Advertising: A Personalized Approach

Siamak (Ash) Ashrafi
14 min readMay 3, 2024
You look like you need a very expensive sports car 🚘

AI/ML Facial Features for the Perfect Ad

A 3-Pronged Approach to Unlock the Perfect Ad

Imagine walking through the mall when a digital billboard displays an ad that seems to read your mind and pick the perfect outfit for your mood. This research dives into the fascinating world of offline (real-world) advertising, focusing on the power of your face — not for identification, but to unlock the perfect ad based on a combination of facial features, emotions, and even your style. (+ time, date & season)

It’s time for lunch and you must look hungry …

Pushing the Bounds of ML and Targeted Ads

Let’s delve into the technical details, focusing on how the system tailors ads to the person looking at the sign.

Real-Time Ad Delivery: Targeting the Sign Viewer

While the system cannot definitively identify specific individuals, it can leverage real-time audience segmentation techniques. Here’s how it works:

Beyond Demographics: A Multifaceted Look

This research goes beyond traditional methods of ad targeting that rely solely on demographics or purchase history.

  1. Facial Feature Analysis: Similar to traditional methods, the system analyzes specific facial features to glean insights about demographics and potential interests.
    - Age and Gender Estimation: Deep learning models trained on vast datasets can estimate a viewer’s age and gender based on facial features. This provides a basic understanding of the target audience.
  2. Emotion Recognition: The system employs deep learning models trained to recognize emotions like happiness, surprise, or frustration from facial expressions. Frameworks like TensorFlow can be explored for on-device processing.
  3. Appearance Recognition: The system focuses on anonymized aspects of a person’s appearance within the frame, such as:
    - Clothing Styles: Analyzing clothing styles (e.g., sporty jacket, elegant dress) can suggest potential interests or activities.
    - Accessories: Recognizing accessories (e.g., gym bag, designer purse) might reveal brand preferences or hobbies.

Source Code:

Deepface is a lightweight face recognition and facial attribute analysis (age, gender, emotion and race) framework for python. It is a hybrid face recognition framework wrapping state-of-the-art models: VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, ArcFace, Dlib, SFace and GhostFaceNet.

Experiments show that human beings have 97.53% accuracy on facial recognition tasks whereas those models already reached and passed that accuracy level.

In this blog post, we’ll build a facial expression recognition pipeline using the OpenMV Cam H7 Plus board and AWS IoT Core, detailing the steps necessary to deploy and use a pre-trained TensorFlow model on the OpenMV board.

Capturing Customer Attention: A Focus on the Sign Viewer

The system relies cameras placed in the digital poster display. Anonymization techniques like differential privacy ensure individual identification is extremely difficult. A computer vision pipeline analyzes the anonymized video stream to focus on the people currently viewing the sign to extract customer aesthetics:

Attention Detection

The computer vision pipeline employs algorithms to detect when someone is looking at the sign. This can involve techniques like gaze estimation or analyzing head orientation.

Source Code:

Hopenet is an accurate and easy to use head pose estimation network. Models have been trained on the 300W-LP dataset and have been tested on real data with good qualitative performance.

The ML model, trained on data that associates these features with potential interests, predicts the preferences of the micro-segment (i.e., the sign viewer) and triggers the display of a targeted advertisement.

Photo by Joshua Earle on Unsplash

Unlocking the Perfect Match

Targeted Ad Delivery: Ads Based on Face, Emotion, & Style

By combining these three aspects, the system tailors ads to a micro-segment of one — you, the current sign viewer:

  • Scenario 1: A young woman with a surprised expression walks by the sign, sporting a trendy backpack. The system, recognizing her age, gender, the surprised look, and the backpack brand, might display an ad for a new smartwatch known for its fitness features.
Look … a new smartwatch known for its fitness features
  • Scenario 2: A man with a tired expression walks by, wearing a business suit. The system, considering his age, gender, the fatigue, and the formal attire, might display an ad for a relaxing massage service or a comfortable pair of shoes.
Look @ comfortable pair of shoes
  • Scenario 3: Starving Savior. Rumbling stomach propels a weary shopper past a digital billboard. A flash of movement catches her eye. The advertised pizza/burger/pasta seems to steam more enticingly. A QR code materializes, offering a discount at a nearby pizzeria. Lunchtime just got delicious.
Lunchtime just got delicious (with a discount!).

Training the System

Unleashing the Power of High-Resolution Cameras and Deep Learning

Data Acquisition: Capturing Details with High-Resolution Cameras

The system relies on strategically positioned high-definition network cameras within the mall. These cameras boast several key technical features essential for accurate data capture:

Megapixel Resolution — Cameras with megapixel resolution ensure crisp and detailed capture of facial features, clothing styles, and accessories. This translates to a rich dataset for training our deep learning models.

Wide Dynamic Range (WDR) Sensors — Malls present diverse lighting conditions, from brightly lit storefronts to dimly lit hallways. WDR sensors compensate for these variations, guaranteeing consistent image quality for robust model analysis.

Edge Computing with Differential Privacy — Privacy-preserving techniques like differential privacy are implemented directly on the camera using edge computing capabilities. This minimizes data transmission and reduces the risk of exposing identifiable information. Here, the raw video data undergoes anonymization before any off-device transmission occurs.

Deep Learning Model Training: A Multi-Stage Process

The anonymized video streams captured by the high-resolution cameras feed into a multi-stage deep learning model training process:

  1. Data Preprocessing and Augmentation — The captured video data undergoes preprocessing to ensure consistency. Techniques like normalization and resizing are applied. Additionally, data augmentation can be employed to artificially expand the dataset by creating variations of existing images. This helps the models generalize better and avoid overfitting.
  2. Facial Feature Recognition Model Training — A deep learning model, potentially a Convolutional Neural Network (CNN), is trained on a massive dataset of labeled facial images. This model learns to identify facial landmarks and estimate demographics like apparent age and gender based on the anonymized data. Frameworks like TensorFlow can be used for this purpose.
  3. Emotion Recognition Model Training — Another deep learning model, potentially a recurrent neural network (RNN), is trained to analyze facial expressions and infer emotions like happiness, surprise, or frustration. Frameworks like TensorFlow can be explored for on-device processing, potentially reducing latency and improving privacy.
  4. Object Detection and Recognition Model Training — A separate deep learning model, likely another CNN, focuses on recognizing objects within the frame, such as clothing styles, accessories, and potentially brand logos present on shopping bags. Techniques like transfer learning, where pre-trained models are fine-tuned for specific tasks, can be beneficial here. This model leverages a vast dataset of labeled images containing various clothing items and brand logos.
  5. Continual Learning Integration — The system can be configured for continual learning. As the deployed cameras capture new data containing novel clothing styles or brand logos, the models can be continuously updated and improved over time. This ensures the system remains adaptable to evolving fashion trends and branding within the real world (offline) environment. (see below)

Mapping Features to Targeted Ads: A Personalized Experience

By analyzing the outputs from the trained models, the system gains valuable insights into the viewer:

  • Facial Features: Estimated age and gender provide a basic understanding of the target audience.
  • Emotions: Recognized emotions can inform the selection of ads that resonate with the viewer’s current state.
  • Appearance: Clothing styles, accessories, and potentially identified brands can hint at potential interests or hobbies, allowing for highly targeted advertising.

This combination of factors allows the system to tailor ads to the individual viewer, creating a more relevant and engaging in-mall/offline/real world advertising experience.

Reading the face.

Face detection provides lots of features

Face{boundingBox=Rect(94, 238 - 345, 490), 
trackingId=-1,
rightEyeOpenProbability=0.8734263,
leftEyeOpenProbability=0.99285287,
smileProbability=0.009943936,
eulerX=2.2682128, eulerY=-6.074356, eulerZ=4.808043,
landmarks=Landmarks{
landmark_0=FaceLandmark{type=0, position=PointF(221.54976, 470.18063)},
landmark_1=FaceLandmark{type=1, position=PointF(148.79044, 402.5858)},
landmark_3=FaceLandmark{type=3, position=PointF(131.24284, 376.60953)},
landmark_4=FaceLandmark{type=4, position=PointF(162.24553, 330.65063)},
landmark_5=FaceLandmark{type=5, position=PointF(184.89821, 442.79837)},
landmark_6=FaceLandmark{type=6, position=PointF(208.09193, 385.20282)},
landmark_7=FaceLandmark{type=7, position=PointF(282.59137, 387.56708)},
landmark_9=FaceLandmark{type=9, position=PointF(330.84723, 373.07874)},
landmark_10=FaceLandmark{type=10, position=PointF(251.61719, 320.842)},
landmark_11=FaceLandmark{type=11, position=PointF(266.47348, 435.18048)}},
contours=Contours{Contour_1=FaceContour{type=1, points=[PointF(198.0, 252.0),
... PointF(179.0, 254.0)]},
Contour_2=FaceContour{type=2, points=[PointF(115.0, 322.0),
PointF(121.0, 312.0), PointF(133.0, 304.0), PointF(151.0, 301.0),
PointF(174.0, 300.0)]},
...
Contour_7=FaceContour{type=7, points=[PointF(238.0, 337.0),
PointF(241.0, 334.0), PointF(247.0, 329.0), PointF(256.0, 325.0),
PointF(266.0, 324.0), PointF(276.0, 325.0), PointF(282.0, 328.0),
PointF(285.0, 330.0), PointF(288.0, 332.0), PointF(283.0, 335.0),
PointF(278.0, 338.0), PointF(271.0, 340.0), PointF(261.0, 341.0),
PointF(253.0, 341.0), PointF(245.0, 339.0), PointF(240.0, 338.0)]},
...
Contour_8= ... Contour_15=FaceContour{type=15, points=[PointF(278.0, 410.0)]}}}]

Object Detection and Recognition with Continual Learning: This approach moves away from relying solely on pre-labeled datasets. The system can leverage deep learning models trained on a base set of clothing items and accessories. As the system encounters new brands through bag logo and text recognition using OCR (Optical Character Recognition), it can employ continual learning techniques to dynamically update its object detection and recognition capabilities. Frameworks like TensorFlow with on-device machine learning can be explored for this purpose.

  • YOLO (You Only Look Once): This real-time object detection system offers high accuracy and speed, making it suitable for processing video streams. Frameworks like Darknet can be used for implementing custom YOLO models.

~~~

Unsupervised Learning

Unsupervised Learning for Customer Segmentation in Retail

This section dives into the concept of unsupervised learning and its application in retail for customer segmentation.

Discovering Hidden Patterns

Unsupervised learning is a branch of machine learning (ML) where algorithms identify hidden patterns within unlabeled data. Unlike supervised learning, where data is pre-categorized, unsupervised learning allows the model to explore the data and find its own structure and relationships. Imagine sorting a pile of mixed Legos — you wouldn’t have pre-sorted categories, but by analyzing color, size, and shape, you could gradually group them into sets (e.g., wheels, bricks, doors).

Deep Learning and Feature Extraction

In the retail scenario, the ML system utilizes deep learning, a type of artificial neural network inspired by the human brain. Deep learning excels at recognizing patterns in complex data like images. Here’s how it works:

  1. Data Acquisition: High-definition cameras strategically positioned near store entrances capture anonymized video streams of people entering and exiting.
  2. Image Segmentation: The system segments the video stream into individual images of each person entering the store. Techniques like background subtraction and object detection isolate people from the background.
  3. Feature Extraction: The deep learning model analyzes these images, focusing on extracting features like:
  • Demographics: Age, gender, and potentially even ethnicity can be estimated based on facial features.
  • Clothing Styles: The model can identify clothing types (e.g., sportswear, formal wear) and accessories (e.g., backpacks, handbags). This can provide insights into potential interests or income brackets.
  • Body Language: Advanced models might even analyze posture and gait for subtle clues about emotions or urgency.

Clustering

Based on the extracted features, the unsupervised learning algorithm groups individuals into clusters. These clusters represent segments of customers with similar characteristics, potentially indicating their affinity for specific stores.

Benefits of Unsupervised Learning for Customer Segmentation

  • Unbiased Insights: Unsupervised learning avoids pre-existing biases present in labeled data, allowing for a more objective segmentation based on observed customer behavior.
  • Discovery of Unknown Segments: This approach can reveal previously unidentified customer groups with unique shopping preferences.
  • Improved Marketing Strategies: By understanding customer segments, retailers can tailor marketing campaigns and product displays to resonate better with each group.

This approach offers a powerful tool for retailers to gain a deeper understanding of their customer base and optimize their marketing strategies for improved customer engagement and sales.

Capturing Data @ StoreFront

The computer vision pipeline employs algorithms like object detection and classification to identify people entering the frame. Once a person is identified, the system focuses on extracting relevant features for analysis. This can include:

  • Facial features: Deep learning models, like Convolutional Neural Networks (CNNs), trained on vast anonymized datasets, can estimate a person’s age, gender, and potentially even emotional state based on facial expressions. Techniques like differential privacy are applied directly on the camera to ensure this anonymized data protects user identity.
  • Clothing styles and accessories: The system analyzes the person’s clothing and accessories to glean potential interests or shopping preferences. Another deep learning model, potentially another CNN, can be trained to recognize various clothing styles, brands (through logo recognition), and accessories present in the video stream. Techniques like transfer learning, where pre-trained models are fine-tuned for specific tasks, can be beneficial here.

By analyzing this combination of features, the ML system learns to associate specific characteristics with a higher likelihood of a person entering a particular store. Over time, this ongoing analysis allows the system to develop a map between these features and the stores themselves. This “map” allows the system to predict, with increasing accuracy, which store a person is likely to enter based solely on their anonymized visual data.

Example Images

Running our high definition camera in a mall we train the system with faces, stores visited and logos from their purchased bags. We map features to what they are buying. (Green Bounding Boxes 👇🏾)

System is making relations between the face, style and places they shop. Photo reference at bottom*

Building on the Edge (AI/ML) Display

AI/ML Digital Poster Display.

We combine …

Poster Display / Outdoor & Storefront Signs

Privacy / Data Security / Transparency

Privacy-Preserving Techniques

Anonymization techniques like differential privacy ensure individual identification is extremely difficult. Facial data is anonymized before any analysis, and only aggregate information about features, emotions, and clothing styles is used.

Data Security

Robust data security measures are crucial. User data should be stored securely and anonymized before any analysis.

Transparency and User Consent

Users should be clearly informed about data collection and its usage for ad targeting. Opt-in mechanisms and clear communication are essential.

The Road Ahead

Refining a Multifaceted Approach

This research explores the potential of using a combination of facial features, emotions, and clothing style for personalized real-world advertising. Here are some key areas for ongoing development:

  • Enhancing Accuracy: All three aspects — facial feature analysis, emotion recognition, and object detection — require ongoing research to improve accuracy, especially in complex or dynamic situations.
  • Addressing Biases: Facial features, expressions, and clothing choices can vary across cultures. Training data needs to be diverse to avoid cultural biases in all recognition models.
  • Ethical Considerations and User Acceptance: Open communication, user consent, and clear guidelines are crucial for building trust and ensuring ethical implementation.
Looks like you need a vacation 🏝️

Conclusion

The Future of Real-World Ads: Balancing Personalization and Privacy

This research proposes a novel approach to advertising: leveraging computer vision and user anonymity to personalize ads based on a viewer’s facial features, emotions, and clothing style. Unlike traditional facial recognition methods, this system prioritizes user privacy by employing anonymization techniques throughout the process.

Beyond Demographics: A More Nuanced Approach

This research moves beyond demographics and purchase history, offering a richer understanding of the viewer. By analyzing facial features, emotions, and clothing styles, the system can tailor ads to a viewer’s potential interests and emotional state, creating a more relevant and engaging experience.

Technical Hurdles and Ethical Considerations

While this research holds promise, technical limitations like accuracy and cultural bias in recognition models need to be addressed. Additionally, ethical considerations regarding user privacy and transparency are paramount. Open communication, user consent, and robust data security measures are crucial for building trust and ensuring responsible implementation.

The Road Ahead: A Future-Proof Strategy

By prioritizing privacy-preserving techniques, ethical considerations, and ongoing research to improve accuracy and address cultural biases, we can pave the way for a future where AI personalizes the offline shopping experience in a way that respects user privacy. This approach fosters a win-win situation for both advertisers and consumers, creating a more relevant and engaging advertising landscape.

Hyper Intelligent Ads

Letter of Intention for R&D

Outdoor Signage and Window Displays

This letter introduces a research article exploring the potential of AI and machine learning (ML) for personalized advertising on digital signage and window displays. The title of the article is “Facial Aware Personalized Advertising (FAPA)”. It’s a 3-Pronged Approach to Unlock the Perfect Ad.

Imagine capturing a customer’s attention with an ad that seems to read their mind! This research dives into this fascinating concept, focusing on using facial features, emotions, and even clothing style to deliver hyper-targeted advertising in real-world settings.

Here’s a glimpse of what the article covers:

  • Using anonymized facial data to understand a viewer’s demographics and emotions
  • Analyzing clothing styles and accessories to glean potential interests
  • Delivering targeted ads that resonate with the viewer on an individual level

The article also delves into the technical aspects of achieving this, including:

  • High-resolution cameras with features like megapixel resolution and wide dynamic range
  • Deep learning models trained to recognize facial features, emotions, and objects (like clothing styles)
  • Privacy-preserving techniques to ensure user anonymity

Overall, the research proposes a novel approach to advertising that personalizes the experience for the viewer while prioritizing privacy. This can be a significant advancement in the world of digital signage and window displays, creating a more relevant and engaging advertising landscape.

The article acknowledges challenges such as ensuring model accuracy across cultures and addressing ethical considerations around user privacy. It concludes with the importance of ongoing research and transparency to build trust and pave the way for a future of responsible AI-powered advertising.

We believe this research has valuable insights for Outdoor Signage and Window Displays. It presents a potential future where your displays can deliver highly targeted and personalized ads, ultimately enhancing the customer experience.

We encourage you to explore the full article for a deeper dive into this promising technology.

The Road Into the Future of Smart AI/ML Ads

Note: Privacy issues.

The system knows nothing about the person looking at the display. It’s just reading the facial features, emotions, style and determining which ad to show …

Zero personal information is used when running the system. Only visual features are used.

Please feel free to leave comments if you find this system violates your privacy so we can address your concerns. 👉🏾

~YLabZ.com

*Shopping photos:

--

--