Skip to content

DASE-DASLab/fdbkeeper-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fdbkeeper-service

What this project is doing

fdbkeeper-service is a standalone keeper service, focused on making protocol and operational behavior independently buildable, testable, and deployable.

Prerequisites

  • C++20 compiler (clang++ or g++)
  • CMake >= 3.20
  • Python3 (for protocol/integration tests)
  • Optional but recommended: ninja

Example (macOS):

brew install cmake ninja python

Example (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build python3

FoundationDB setup

fdbkeeper-service supports two modes:

  • memory backend: no FoundationDB needed
  • fdb backend: requires fdb_c headers/libs and a reachable cluster

Option A: Install prebuilt FoundationDB (recommended)

Install FoundationDB from official package (or your internal mirror), then verify:

which fdbcli
ls /usr/local/include/foundationdb/fdb_c.h
ls /usr/local/lib/libfdb_c.dylib   # macOS
# or
ls /usr/local/lib/libfdb_c.so      # Linux

Option B: Build FoundationDB from source

git clone https://github.com/apple/foundationdb.git
cd foundationdb
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

Install client/server artifacts (example prefix /usr/local):

cmake --install build --prefix /usr/local

After install, verify:

fdbcli --version
test -f /usr/local/include/foundationdb/fdb_c.h && echo "fdb_c.h OK"

Start a local FoundationDB cluster (for integration testing)

If your environment already has a running cluster, skip this section.

Minimal single-node local cluster example:

mkdir -p /tmp/fdb-local/data /tmp/fdb-local/log
echo 'local:localsecret@127.0.0.1:14500' > /tmp/fdb-local/fdb.cluster

/usr/local/libexec/fdbserver \
  --cluster-file /tmp/fdb-local/fdb.cluster \
  --listen-address 127.0.0.1:14500 \
  --public-address 127.0.0.1:14500 \
  --datadir /tmp/fdb-local/data \
  --logdir /tmp/fdb-local/log

In another terminal:

fdbcli --no-status -C /tmp/fdb-local/fdb.cluster \
  --exec "configure new single memory tenant_mode=optional_experimental; status"

Set cluster file for this project:

export FDB_CLUSTER_FILE=/tmp/fdb-local/fdb.cluster

Build

Quick local build:

cmake -S . -B build
cmake --build build -j
ctest --test-dir build --output-on-failure

Enable FoundationDB C API backend:

cmake -S . -B build-fdb -DFDBKEEPER_ENABLE_FDB_C_API=ON
cmake --build build-fdb -j
ctest --test-dir build-fdb --output-on-failure

Run selected tests only:

ctest --test-dir build -R protocol_smoke --output-on-failure
ctest --test-dir build -R cli_behavior --output-on-failure
ctest --test-dir build-fdb -R fdb_real_integration --output-on-failure

Enable live-cluster integration test (requires reachable cluster file):

cmake -S . -B build-fdb-live \
  -DFDBKEEPER_ENABLE_FDB_C_API=ON \
  -DFDBKEEPER_ENABLE_REAL_FDB_INTEGRATION_TEST=ON \
  -DFDBKEEPER_REAL_CLUSTER_FILE=/usr/local/etc/foundationdb/fdb.cluster
cmake --build build-fdb-live -j
ctest --test-dir build-fdb-live --output-on-failure

This mode runs:

  • fdb_real_integration: focused CRUD + session-ephemeral cleanup path.
  • fdb_real_protocol_suite: full protocol compatibility suite on --backend fdb (watch/multi/multi-read/fail-injection/4lw).

Run

Common commands:

./build/fdbkeeper-service --help
./build/fdbkeeper-service --version
./build/fdbkeeper-service --backend memory
./build/fdbkeeper-service --backend fdb
./build/fdbkeeper-service --backend memory --fail-op 4
./build/fdbkeeper-service --bind-host 0.0.0.0 --port 9181 --session-timeout-ms 30000
./build/fdbkeeper-service --http-control-port 9182 --prometheus-port 9183

Run with explicit config file:

./build/fdbkeeper-service --config ./config/fdbkeeper-service.xml

--backend values:

  • memory: in-process test backend (default).
  • fdb: FoundationDB backend path. Requires this binary to be built with -DFDBKEEPER_ENABLE_FDB_C_API=ON and available fdb_c; otherwise startup fails fast with an explicit error.

Optional HTTP endpoints:

  • --http-control-port <n>: enables lightweight readiness/control HTTP endpoint on GET /ready, GET /readyz, GET /health, GET /ping, plus GET /metrics.
  • --prometheus-port <n>: exposes Prometheus-style metrics on GET /metrics.

Run with explicit cluster file:

FDB_CLUSTER_FILE=/tmp/fdb-local/fdb.cluster \
./build-fdb/fdbkeeper-service --backend fdb --port 21810

Run a minimal leader/follower pair for replication checks:

./build/fdbkeeper-service --node-role leader --port 9181 --peers 127.0.0.1:9182
./build/fdbkeeper-service --node-role follower --port 9182 --peers 127.0.0.1:9181

Role behavior notes:

  • --node-role standalone + non-empty --peers: automatic election/failover enabled.
  • --node-role leader|follower: automatic election loop is disabled; role transitions are controlled by explicit 4lw commands (rqld / ydld).

Optional FDB backend fail-fast tuning:

export FDBKEEPER_FDB_TIMEOUT_MS=3000
export FDBKEEPER_FDB_RETRY_LIMIT=1

Optional FDB keyspace isolation (useful for multi-tenant tests):

export FDBKEEPER_KEY_PREFIX=fdbkeeper-service/my-isolated-prefix/

4lw Commands

The service supports these 4lw commands:

  • ruok, mntr, stat, isro, conf, cons, srvr
  • wchs, wchc, wchp
  • envi, dump, srst, crst, dirs, rcvr
  • apiv, csnp, lgif, raft
  • rqld, ydld, rclc, clrs, ftfl

Notes:

  • rqld/ydld are operational role-control commands.
  • ftfl reports keeper feature flags (multi_read, filtered_list, check_not_exists, create_if_not_exists, create_ttl, persistent watch features, etc).

Replication Observability

When running with peers (for example --node-role leader --peers 127.0.0.1:9182), raft includes:

  • replication_log_size: in-memory delta-log entry count.
  • replication_next_idx: next local delta index.
  • replication_peer_count: number of peers with tracked replicated index.
  • replication_peer_min_idx / replication_peer_max_idx: replicated-index spread across peers.
  • replication_force_snapshot: 1 means next replication cycle will force snapshot fallback.

lgif includes:

  • replication_log_size
  • replication_next_idx

2PC Replication Modes

Default mode (recommended for production):

  • Environment variable FDBKEEPER_2PC_LOG_COMMIT is unset.
  • Leader uses explicit follower RPC rounds (tx_prepare + tx_commit/tx_abort) before success response.

Optional log-encoded mode:

  • Set FDBKEEPER_2PC_LOG_COMMIT=1.
  • Prepare/decision are encoded in replication log entries (2p records).
  • During in-flight foreground write transactions, quorum replication uses bounded retries and disables snapshot fallback for those decision-critical rounds, preventing stale-snapshot quorum acknowledgements.

Environment Variables

  • FDB_CLUSTER_FILE: cluster file path for --backend fdb.
  • FDBKEEPER_KEY_PREFIX: backend keyspace prefix isolation.
  • FDBKEEPER_FDB_TIMEOUT_MS: FDB operation timeout in milliseconds.
  • FDBKEEPER_FDB_RETRY_LIMIT: retry count for FDB backend operations.
  • FDBKEEPER_SUPER_DIGEST: super-user digest id for ACL bypass.
  • FDBKEEPER_AVAILABILITY_ZONE: value published under /keeper/availability_zone.
  • FDBKEEPER_2PC_LOG_COMMIT: enable log-encoded 2PC mode.
  • FDBKEEPER_DEBUG_ERRORS, FDBKEEPER_DEBUG_2PC: debug logging toggles.

About

Standalone FDKkeeper Service

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors