Generative Matchmaking
This post details an AI-based matchmaking simulation for dating, leveraging vector-based semantic search for user compatibility recommendations and retrieval-augmented generation (RAG) for co-personalized first-date ideation using OpenAI’s GPT-4, OpenAI’s DALL·E-2, and Chroma, an AI-native open-source vector database. My brother, Dimitri Deychakiwsky, and I put this together over several weekends.
Elevator Pitch
Dating
Popular dating services have converged to an experience where users must swipe, match, chat, and plan to meet. For many users, this results in too much time spent repeatedly swiping, small talking, and planning.
I’m looking for a connection where I can simply ask, “Would you like to meet up for drinks later?” I wish online dating could solely serve the purpose of arranging dates, eliminating the need for aimless small talk. Repeating the same text-based conversations that lead nowhere gets tiresome. Meeting in person right away would allow us to gauge our compatibility and save time. – I’m a 27-year-old female.
Inspired by the emergence of human-like behavior in modern AI, our vision is to create an experience where each user is paired with a personalized AI matchmaking assistant. This assistant can adapt to changes in user preferences with the help of human feedback and works behind the scenes to connect with other user assistants, creating harmonious matches. The goal is to reduce the time users spend searching and increase the time they spend meeting people. We believe that this approach can minimize swipe time, eliminate small talk, and assist in planning. The remainder of this post provides details on a simulated subset of functionality that integrates into the overall effort.
🧑 → 🤖💬 → ❤️ ← 💬🤖 ← 🧑
Framework
We focused on dating as an application of matchmaking. However, we argue that our framework generalizes to a broad spectrum of use cases, e.g., jobs, philanthropy, customer churn, e-commerce, and overall business intelligence.
Simulation
The simulated implementation is available in a public GitHub repository. As a proof-of-concept, we generated synthetic dating user profiles with pictures and produced matches for each of them, all of which are stored in the profiles directory. We also generated assisted date plans for an example pair of matched users.
Generating Synthetic Users
Profiles
We sequentially generated 250 synthetic dating user profiles using OpenAI’s chat completion endpoint with the following configuration settings.
{
"model": "gpt-4-0613",
"max_tokens": 5000,
"temperature": 1.05
}
Initially, our attempt to generate numerous dating profiles in a single inference pass, allowing the model to consider previously generated profiles for uniqueness and diversity, was hindered by output token limit constraints. To address this, we adopted an alternative approach: we chatted with the model to generate a collection of names and subsequently instructed the model to sample from this pool as a seed for profile generation.
[System Message]: You are a helpful assistant.
[User Message]: Generate 10 unique first and last names,
ensuring a balanced diversity.
[Assistant Message]: ...
[User Message]: Select a name randomly and create
a dating profile for the chosen name.
[Assistant Message]: ...
It’s worth noting that the sampling process suffers from positional biases inherent in language models. Nevertheless, we observed that this strategy, at the very least, improved generation of a diverse set of names across all users.
For the generation of the dating profile, we utilized OpenAI’s function calling API, which produced a structured JSON response based on our dating profile schema where we hardcoded the dating location to Los Angeles, California for all users. Note that although our schema is basic, it can be extended to include other pieces of static or dynamic information along with the concept of “dealbreakers” which effectively translates to a hard filter on specified partner preference attributes.
Developers can now describe functions to gpt-4-0613 and gpt-3.5-turbo-0613, and have the model intelligently choose to output a JSON object containing arguments to call those functions. This is a new way to more reliably connect GPT’s capabilities with external tools and APIs. These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature. Function calling allows developers to more reliably get structured data back from the model. – OpenAI docs.
After generating the dating profile, we instructed the model to exclusively summarize both the user’s dating profile and their partner preferences.
...
[User Message]: Summarize the user's dating profile.
Include all fields other than partner_preferences.
Output a concise paragraph.
[Assistant Message]: ...
[User Message]: Summarize the user's dating partner_preferences.
Include partner_preferences fields only and nothing else.
Output a concise paragraph.
[Assistant Message]: ...
We did not include the user’s name in the summarization instruction context to maintain generality across users. An example synthetic dating profile follows.
{
"name": "Mia Wong",
"age": 29,
"height": "5'6",
"school": "University of California, Berkeley",
"job_industry": "Technology",
"job_title": "Software Engineer",
"hometown_location": "San Francisco, California",
"dating_location": "Los Angeles, California",
"languages_spoken": [
"English",
"Mandarin"
],
"values": [
"Responsibility",
"Family",
"Ambition"
],
"interests": [
"Cooking",
"Hiking",
"Traveling"
],
"education_level": "Graduate",
"religious_beliefs": "Atheist",
"politics": "Liberal",
"dating_intentions": "Long term, open to short",
"relationship_type": "Monogamy",
"gender": "Woman",
"pronouns": "She/Her/Hers",
"sexuality": "Straight",
"ethnicity": "East Asian",
"has_children": false,
"want_children": true,
"pets": [
"Cat"
],
"zodiac_sign": "Sagittarius",
"mbti_personality_type": "INTJ",
"drinking": "Sometimes",
"smoking": "No",
"marijuana": "No",
"drugs": "No",
"exercise": "Active",
"partner_preferences": {
"minimum_age": 28,
"maximum_age": 35,
"minimum_height": "5'6",
"maximum_height": "6'2",
"has_children": false,
"want_children": true,
"sexuality": "Straight",
"drinking": "Sometimes",
"smoking": "No",
"marijuana": "No",
"drugs": "No",
"exercise": "Active",
"gender": "Man",
"dating_intentions": "Long term, open to short",
"relationship_type": "Monogamy",
"ethnicities": [
"Black African Descent",
"East Asian",
"Middle Eastern",
"Native American",
"Pacific Islander",
"South Asian",
"Southeast Asian",
"White Caucasian"
],
"politics": [
"Liberal",
"Not political"
],
"job_industry": [
"Technology",
"Finance",
"Consulting"
],
"languages_spoken": [
"English"
],
"values": [
"Responsibility",
"Family",
"Ambition"
],
"interests": [
"Cooking",
"Hiking",
"Traveling"
],
"education_level": [
"Undergraduate",
"Graduate"
]
},
"profile_summary": "see below",
"preferences_summary": "see below",
"user_id": "fce07a22-133b-46b9-b187-ca1ae5c0b70e"
}
Profile Summary
The user is a 29-year-old woman who identifies as straight.
She uses she/her pronouns and is of
East Asian ethnicity. She graduated from the University
of California, Berkeley and now works
as a Software Engineer in the technology industry.
Originally from San Francisco, California,
she is now looking to date in Los Angeles, California.
She is an atheistic liberal who values
responsibility, family, and ambition.
She enjoys cooking, hiking, and traveling. Despite her
busy schedule, she maintains an active lifestyle.
She speaks both English and Mandarin fluently.
She owns a cat and wishes to have children although
she doesn't have any yet. She is a
Sagittarius with an INTJ Myers-Briggs Type Indicator.
An occasional drinker, she
doesn't smoke or use marijuana or other drugs.
Her intention is to be in a monogamous
relationship, open to both long term commitments
and causal dating.
Partner Preferences Summary
The user is interested in dating a
man who is between the ages of 28 and 35,
and between 5'6" and 6'2" in height. Her preferred
partner would be someone who does not
have children but is open to having some in the future.
The partner should be straight,
drinks occasionally, doesn't smoke or involve in marijuana
or drugs. She prefers a man who
is active and practices monogamy. Ethnicity- wise, her
interests include a range of ethnic
groups, from Black African Descent, East Asian,
Middle Eastern, Native American, Pacific
Islander, South Asian, Southeast Asian to White Caucasian.
She prefers partners who identify
as Liberal or are not political. The ideal match for her
would work in the Technology, Finance,
or Consulting industries. The primary language should be
English. In terms of values, she looks
for Responsibility, Family, Ambition, and shares interests
in cooking, hiking and traveling.
Potential partners should have at least an undergraduate
level of education.
Images
We generated a user profile picture for every user by invoking OpenAI’s DALL·E-2 text-to-image model. We constructed the prompt by interpolating the following template with several fields from the user dating profile schema. Originally, we included the first and last name but OpenAI, rightly-so, flagged it as unsafe, so we removed it. Note that the prompt for this model must stay under a shorter character limit.
prompt: str = (
f"Dating profile picture of a "
f"{user_profile.height} {user_profile.ethnicity} {user_profile.age} "
f"year old {user_profile.gender} ({user_profile.pronouns}) "
f"with an {user_profile.exercise} physique who works as a "
f"{user_profile.job_industry} professional "
f"that values {' and '.join(user_profile.values[:2])} "
f"who enjoys {' and '.join(user_profile.interests[:2])} who "
f"identifies as {user_profile.religious_beliefs}."
)
Here are 28 examples, the overall quality varies.
Matchmaking
Recommender System
Mainstream dating apps train hybrid recommender systems that model content-based and collaborative filtering effects based on a robust set of features including explicit / implicit user interactions, e.g., swiping, time-on-page, etc. Modern recommender systems are implemented as a conglomerate of individual components that often implement retrieval and ranking as separate tasks.
Researchers have shown that LLMs generalize as zero-shot recommenders but, through ablation studies, found that they’re primarily content-based and that they struggle with non-trivial ranking tasks, likely connected to their inherent positional bias.
Since our study lacked human feedback, we experimented with a prompt-based zero-shot retrieval mechanism but realized this approach would not scale effectively due to context-window limits. Instead, we implemented retrieval as semantic vector search. To rank the retrieved output, we experimented with prompt engineering the ranking task but the results clearly suffered from artifacts of positional bias. Although bootstrapping the ranking task alleviated some concern by reducing variance and thereby stabilizing rankings, the results did not suffice, so we chose to implement a graph inspired ranking heuristic.
Chroma Vector Database
We spun up a local and persistent version of Chroma,
an AI-native open-source vector database. By default,
Chroma uses the Sentence Transformers all-MiniLM-L6-v2
model to create embeddings. This embedding model can create sentence and
document embeddings that can be used for a wide variety of tasks.
This embedding function runs locally, downloading and caching the model files.
We loaded every user’s profile summary, as a document,
into a collection and tagged each entry with the user’s gender
and sexuality as metadata for which we manually enforce as
“dealbreakers” or hard filters.
Chroma’s default embedding model maps documents to $\mathbb{R}^{384}$ (384 dimensional space). We visualized all 250 user dating profile summaries by projecting them down to $\mathbb{R}^{3}$ with the T-SNE algorithm using Tensorflow’s Embedding Projector. The result follows.
We assume that the two global clusters represent gender. An interesting application of these natural language embeddings as a product feature is that users could issue queries to find partners by simply typing, e.g., “I’m looking for a … that plays ultimate frisbee and … “.
Retrieval
Recall that we created two summaries for each user that were mutually exclusive.
To compute an initial set of candidates for a given query user, we searched the Chroma collection
with their partner preferences summary and specified a hard metadata filter based on
their partner preference gender and sexuality. Under the hood, Chroma
executes a similarity / distance lookup on the embedded query text and returns the
closest n_results
or neighbors sorted by ascending distance. Choosing a distance metric and an
appropriate value for the number of neighbors are hyperparameters which can be tuned
to the use case. We configured Chroma to use cosine distance and to return the 25 closest
neighbors.
Ranking
The retrieved set of candidate user profiles for a given user can naively be surfaced as matches for human feedback but some of those candidates may be better matches than others. Borrowing from graph theory, we built a graph by treating users as nodes and adding retrieval-distance weighted edges for every user-candidate pair of nodes connecting a query user to their retrieved candidate set.
We defined compatibility as a bidirectional connection between two users where a retrieved candidate’s retrieved set of candidates includes the query user. These candidates, that point back to query user, are ranked higher than those that don’t. This can be thought of as one level of breadth-first search. To illustrate, user $M$ is compatible with the ordered set of users $\{C, B, P\}$ in the diagram that follows.
Visualizing Matches
To visualize matches, we hacked together a user interface with directions on how to run it outlined in the repo’s readme. A screenshot of Theodore Heath’s matches, ranked from left-to-right, follows. The presence of an asterisk by the name of a match indicates a bidirectional connection (compatibility) while the others are unidirectional matches with one directed edge pointing from Theodore to the candidate match.
The most compatible match recommended for Theodore Heath is Olivia Windsor. Theodore and Olivia being marked as a compatible match means that Theodore’s partner preferences summary is semantically similar to Olivia’s profile summary and that Olivia’s partner preferences summary is semantically similar to Theodore’s profile summary within the embedding geometry. In other words, Theodore is looking for someone like Olivia and Olivia is looking for someone like Theodore based on the information available to us. We can inspect Theodore and Olivia’s profile and partner preferences summaries to qualitatively evaluate the match. Our matching algorithm can be thought of as a soft matcher with a hard filter on preferred sexuality and gender.
Theodore’s Profile Summary
This user is a 29-year-old Software Engineer in the
Technology industry based in Los Angeles,
California, but originally from San Francisco.
He stands 6 feet tall and is a graduate from
the California State University. He identifies as
a straight white Caucasian man who uses
the pronouns He/Him/His, with agnostic religious
beliefs, and a moderate political stance.
He values honesty, humor, and intelligence and
enjoys outdoor activities, video games, and
reading in his spare time. This user is a Pisces
and his MBTI personality type is INTJ.
Despite being occasionally up for a drink, he does
not smoke or use drugs, including marijuana.
He leads an active lifestyle and owns a dog.
His desires for the future include having children,
as he currently doesn't have any. He's also open to
a long-term, monogamous relationship.
His primary languages are English and Spanish.
Theodore’s Partner Preferences Summary
The user is seeking a female partner aged 24 to 31
years who falls between the heights of 5'5"
and 6'0". She doesn't have children but must be
open to having them in the future. An ideal
match would be a straight woman who occasionally
drinks but doesn't smoke or use marijuana or
other drugs. Light to moderate exercise habits are
preferred. Ethnicity-wise, he is open to
dating women of both White Caucasian and Hispanic/Latino
backgrounds. In terms of political
inclination, he prefers a woman with moderate views.
Professionally, he prefers women working
in the Education, Technology, or Healthcare industry.
Fluency in English is a must, and he
values Honesty, Kindness, and Intelligence.
Shared interests might include outdoor activities,
music, or reading. Education is important too,
with a preference for woman who have at least
an undergraduate degree or higher. He is looking
for a long-term relationship that is monogamous.
Olivia’s Profile Summary
This user is a 28 year-old woman standing at 5'6".
She is a UX Designer working in the tech
industry and is a graduate of Stanford University.
She identifies as straight and is interested
in a monogamous, long-term relationship.
Originally from New York, New York, she is currently
dating in Los Angeles, California. She fluently
speaks English and Spanish, and identifies her
ethnicity as white caucasian. Her core values
are creativity, independence, and adventure.
Interests include photography, cooking, and travel.
She identifies as agnostic and liberal.
Although she doesn't have any children yet, she
wants them in the future. She is an active
drinker but doesn't smoke, use marijuana, or drugs.
When it comes to exercise, she's quite active.
She owns a cat and is a Gemini with an
ENFP personality type.
Olivia’s Partner Preferences Summary
The user is looking for a man who is between the ages
of 28 and 35 and stands between 5'8"
and 6'2". He shouldn't have children currently but
should want to have them in the future.
Ideally, he would be straight, an active individual,
and a non-smoker. He's preferred to be
someone who has no problems with drinking but doesn't
use marijuana or any other drugs. In
terms of ethnicity, he can be of any listed ethnicity.
Politically, a liberal or moderate man
would be suitable. The ideal man would work in either
Tech, Finance, or Health sectors, and
English should be among his languages. He should value
kindness, honesty, and be committed.
His interests should include hiking, photography, and
traveling. The user prefers someone who
has at least an undergraduate level of education.
Planning
Once the matches were made, we generated personalized first-date activities and talking points by applying a technique known as retrieval-augmented generation (RAG) where we instructed a LLM to assume a role of an expert creative date planner or conversationalist conditioning the generative task on user information. This is a form of in-context learning where the LLM is fed information retrieved from an external storage system to aid in performing a task. Analogous to how the human brain stores information in long-term and short-term memory, the external system is the AI’s long-term memory whereas the LLM’s context-window is its short-term memory. In the sections that follow, we report results for Theodore and Olivia, our example match from the previous section. The AI system plans based on their information, e.g., who they are, what they like, what they want, what city they’re meeting in, etc. We used a basic Chain of Thought (CoT) prompting strategy that incites the model to break down its reasoning process. There are more advanced prompting strategies that we’ve left to future spikes, e.g., Tree of Thought (ToT). We used the following configuration for both tasks.
{
"model": "gpt-4-0613",
"max_tokens": 5000,
"temperature": 0.0
}
Activities
System Message
You are an expert date planner.
User Message
Brainstorm co-personalized first-date ideas for
two users that matched on our dating service
based on their descriptions below.
Output the result in the form of a letter to them
with 5 ideas for the morning, 5 ideas for the afternoon,
and 5 ideas for the evening signed as
"Your loving AI assistants".
Think step-by-step to explain why every idea
is good for them.
--
Theodore Heath: {Theodore's profile summary}
Olivia Windsor: {Olivia's profile summary}
Assistant Message
Dear Theodore and Olivia,
We are thrilled to assist you in planning some
unique and personalized dates that will
cater to both of your interests and values.
Here are our suggestions:
Morning Ideas:
1. Hiking in the Santa Monica Mountains:
Given both of your active lifestyles and Theodore's
love for outdoor activities, this would be a great way
to start the day. You can enjoy the
beautiful views and engage in deep conversations,
which INTJs like Theodore appreciate.
2. Visit to the Griffith Observatory:
This could be a fun and educational experience for
both of you. As a Pisces, Theodore might enjoy the
celestial aspect, while Olivia's adventurous
spirit would be satisfied.
3. Cooking Class:
Since Olivia enjoys cooking, a morning cooking
class could be a fun and interactive experience.
You could choose a Spanish cuisine class to utilize
both of your language skills.
4. Dog Park Visit:
Theodore could bring his dog and you
both could enjoy a casual morning at
the park. This would also give Olivia
a chance to share her love for animals.
5. Photography Walk in Downtown LA:
Olivia could showcase her photography skills and you
both could explore the city's architecture and street art.
Afternoon Ideas:
1. Tech Museum Visit:
Given your shared industry, a visit to a tech museum could
spark interesting conversations and mutual learning.
2. Beach Picnic:
Pack a lunch and head to the beach.
This would combine Theodore's love for the outdoors
and Olivia's love for travel and adventure.
3. Bookstore Visit:
As Theodore enjoys reading, visiting a bookstore could be a fun
experience. You could each pick a book for the other to read.
4. Video Game Arcade:
Theodore could share his love for video games with
Olivia in a fun, casual setting.
5. Spanish Language Film:
Watching a Spanish language film at a local indie
cinema could be a unique way to use your shared language skills.
Evening Ideas:
1. Star Gazing:
Given Theodore's Pisces sign and Olivia's adventurous spirit,
an evening spent star gazing could be both romantic and enjoyable.
2. Comedy Show:
A comedy show could be a great way to incorporate humor,
which Theodore values.
3. Home Cooked Dinner:
Olivia could cook dinner at home, providing a more
intimate setting. You could play video games
or discuss the books you picked earlier.
4. Wine Tasting:
As both of you enjoy a drink occasionally, a local wine
tasting could be a fun and sophisticated experience.
5. Outdoor Movie Night:
Set up a projector in your backyard and enjoy a movie
under the stars. This combines Theodore's
love for the outdoors and Olivia's love for creativity.
We hope these ideas inspire you to have fun,
engaging, and meaningful dates.
Enjoy your time together!
Your loving AI assistants.
Talking Points
System Message
You are an expert conversationalist.
User Message
Brainstorm ten date talking points for two
users that matched on our dating service
based on their descriptions below. Output
the result in the form of a letter to
them signed as "Your loving AI assistants".
Think step-by-step.
--
Theodore Heath: {Theodore's profile summary} {Theodore's partner preferences summary}
Olivia Windsor: {Olivia's profile summary} {Olivia's partner preferences summary}
Assistant Message
Dear Theodore and Olivia,
We are thrilled to introduce you to each other as we
believe you both share common interests, values, and
future goals. To help you get to know each other better,
we have brainstormed some talking points for your first date.
1. **Backgrounds**: You both have moved from different cities
to Los Angeles. Discuss what brought you to LA and how your
experiences have been so far.
2. **Professions**: You both work in the tech industry.
Discuss your roles, your journey into tech, and how you
see the industry evolving.
3. **Education**: You both are graduates from prestigious
universities. Share your college experiences, favorite subjects,
and how your education has shaped your career.
4. **Outdoor Activities**: You both enjoy outdoor activities.
Discuss your favorite hiking trails, outdoor games, and any
memorable adventures you've had.
5. **Pets**: Theodore, you own a dog and Olivia, you own a cat.
Share your pet stories, their quirks, and how they've enriched
your lives.
6. **Future Plans**: You both desire to have children in the future.
Discuss your thoughts on family, parenting, and how you
envision your future family life.
7. **Books and Games**: Theodore, you enjoy reading and video games.
Olivia, you might find it interesting to know what books he's
currently reading or his favorite video games.
8. **Travel and Photography**: Olivia, you love traveling and photography.
Theodore, you might be interested in hearing about her favorite
travel destinations and her photography experiences.
9. **Cooking**: Olivia, you enjoy cooking. Discuss your favorite dishes
to cook, and perhaps plan a cooking date in the future.
10. **Values**: You both value honesty and intelligence. Discuss what
these values mean to you and how they influence your life decisions.
Remember, the goal is to get to know each other better and find
common ground. Enjoy your date and we wish you both a wonderful time!
Best regards,
Your loving AI assistants.
Cost
The final bill paid to OpenAI for this project was under $80 USD.
Literature Review
- https://arxiv.org/pdf/2203.02155.pdf (OpenAI)
- https://arxiv.org/pdf/2304.03442.pdf (Stanford)
- https://arxiv.org/pdf/2306.02707.pdf (Microsoft)
- https://arxiv.org/pdf/2305.10601.pdf (Google)
- https://arxiv.org/pdf/2201.11903.pdf (Google)
- https://arxiv.org/pdf/2308.10053.pdf (Netflix)
- https://arxiv.org/pdf/2305.08845.pdf (Netflix)
- https://arxiv.org/pdf/2305.07622.pdf (Amazon)
- https://arxiv.org/pdf/2005.11401v4.pdf (Meta)
- https://arxiv.org/pdf/2303.11366.pdf (MIT)