Skip to content

Benchmark (with up to 10,000 documents) #40

@DanielRuf

Description

@DanielRuf

I was benchmarking luasmith via Podman on Windows 11 Pro using the podman-machine-default on WSL 2 with my Surface Laptop 6 for Business (24 GiB RAM, 1 TB SSD) with a Intel Core Ultra 165H (16 cores / 22 threads) on highest performance mode connected to the power cord

Steps

  • install hyperfine
  • create 3 folders: luasmith-100, luasmith-1000, luasmith-10000
  • create empty subfolder content in luasmith-100, luasmith-1000, luasmith-10000
  • copy luasmith binary into luasmith-100, luasmith-1000, luasmith-10000
  • create 2 folders: input_md, output_md
  • download and extract 10000-markdown-files.zip
  • copy all md files from 10000-markdown-files to input_md
  • run python3 generate-frontmatter.py
  • sort files in output_md alphabetically
  • copy first 100 md files from output_md to luasmith_100/content
  • copy first 1000 md files from output_md to luasmith_1000/content
  • copy all 10000 md files from output_md to luasmith_10000/content
  • run cd luasmith-100 && hyperfine -i "./luasmith blog"
  • run cd luasmith-1000 && hyperfine -i "./luasmith blog"
  • run cd luasmith-10000 && hyperfine -i "./luasmith blog"

generate-frontmatter.py:

import os
import random
from datetime import datetime, timedelta

src = "input_md"
dst = "output_md"

os.makedirs(dst, exist_ok=True)

for i, filename in enumerate(os.listdir(src)):
    if not filename.endswith(".md"):
        continue

    with open(os.path.join(src, filename), "r", encoding="utf-8") as f:
        content = f.read()

    date = datetime(2020, 1, 1) + timedelta(days=random.randint(0, 1500))

    frontmatter = f"""---
title: "Document {i}"
description: "Benchmark file {i} generated from corpus"
date: "{date.strftime('%Y-%m-%d')}"
---

"""

    with open(os.path.join(dst, filename), "w", encoding="utf-8") as f:
        f.write(frontmatter + content)

Results

first 100 md files:

Benchmark 1: ./luasmith blog
  Time (mean ± σ):     385.3 ms ±  54.4 ms    [User: 71.5 ms, System: 20.1 ms]
  Range (min … max):   349.9 ms … 529.9 ms    10 runs

first 1000 md files:

Benchmark 1: ./luasmith blog
  Time (mean ± σ):      5.152 s ±  0.519 s    [User: 0.928 s, System: 0.289 s]
  Range (min … max):    4.264 s …  5.708 s    10 runs

all 10000 md files:

Benchmark 1: ./luasmith blog
  Time (mean ± σ):     53.186 s ±  1.947 s    [User: 9.304 s, System: 3.006 s]
  Range (min … max):   50.269 s … 55.711 s    10 runs

minimum:

xychart
    title "time in ms per # md documents"
    x-axis "# of documents" [100, 1000, 10000]
    y-axis "time (in ms)" 100 --> 60000
    bar [350, 4264, 50269]
Loading

maximum:

xychart
    title "time in ms per # md documents"
    x-axis "# of documents" [100, 1000, 10000]
    y-axis "time (in ms)" 100 --> 60000
    bar [530, 5708, 55711]
Loading

per document minimum:

xychart
    title "time in ms per md document"
    x-axis "# of documents" [100, 1000, 10000]
    y-axis "time (in ms)" 0 --> 6
    bar [3.50, 4.264, 5.0269]
Loading

per document maximum:

xychart
    title "time in ms per md document"
    x-axis "# of documents" [100, 1000, 10000]
    y-axis "time (in ms)" 0 --> 6
    bar [5.30, 5.708, 5.5711]
Loading

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions