Skip to content

elto21ab/THESIS

Repository files navigation

Code for the paper, "LLMs as Proxy Survey Participants With RAG", by Elias Torjani, Airidas Brikas, and Daniel Hardt (our BSc thesis)

Check out our [abstract-length] paper on it: Market research via persona-induced Large Language Models, or see our poster below as a TL;DR Poster


How to reproduce our experiments with your own data

  1. Export your chat messages from Facebook, Instagram, and/or WhatsApp (instructions below)
  2. Take the surveys to constitute target responses, for the LLMs proxying you in the same surveys.
  3. Clone this repository
  4. Download Ollama [WIP refactoring for performance] Download llama.cpp and a models' gguf quant to run inference locally.
    • Any cloud provider is discouraged to mitigate leakage-risk of sensitive information.

[!EXPORT]

How to get chat messages from Facebook, Instagram, and/or WhatsApp This is a relatively manual process, and Meta will take about a week.

  1. Facebook incl. Instagram --> Account settings --> Download your information --> Download or transfer information --> pick account[s] (incl. Instagram) --> Specific types of information --> choose "Messages" (Get "All time", and in JSON format)
  2. WhatsApp --> Settings --> Chats --> Export chat --> pick your 1-on-1 chats to export "Without media"
  3. Optional: Use Beeper's API to continuosly export new messages, but be aware of our experiment is a snapshot in time.

Note

This fork is a refactored version of this repository, where our original commit history is preserved. This repostitory is meant to make the experiments easier to reproduce with your own data.