Code for the paper, "LLMs as Proxy Survey Participants With RAG", by Elias Torjani, Airidas Brikas, and Daniel Hardt (our BSc thesis)
Check out our [abstract-length] paper on it: Market research via persona-induced Large Language Models, or see our poster below as a TL;DR

- Export your chat messages from Facebook, Instagram, and/or WhatsApp (instructions below)
- Take the surveys to constitute target responses, for the LLMs proxying you in the same surveys.
- Clone this repository
Download Ollama[WIP refactoring for performance] Download llama.cpp and a models' gguf quant to run inference locally.- Any cloud provider is discouraged to mitigate leakage-risk of sensitive information.
[!EXPORT]
How to get chat messages from Facebook, Instagram, and/or WhatsApp
This is a relatively manual process, and Meta will take about a week.
- Facebook incl. Instagram --> Account settings --> Download your information --> Download or transfer information --> pick account[s] (incl. Instagram) --> Specific types of information --> choose "Messages" (Get "All time", and in JSON format)
- WhatsApp --> Settings --> Chats --> Export chat --> pick your 1-on-1 chats to export "Without media"
- Optional: Use Beeper's API to continuosly export new messages, but be aware of our experiment is a snapshot in time.
Note
This fork is a refactored version of this repository, where our original commit history is preserved. This repostitory is meant to make the experiments easier to reproduce with your own data.