Skip to content

Liveness bug in VSR implementation: Client table and uncommitted ops after view change #16

@jorangreef

Description

@jorangreef

I believe there may be a very subtle liveness bug in https://github.com/UWSysLab/tapir/blob/master/replication/vr/replica.cc#L450 where the client table (effectively a record of committed replies) is touched on both the prepare and commit paths.

However, there is a difference between uncommitted ops and committed ops, as uncommitted ops may not survive a view change. Yet the implementation does not appear to account for this by fixing up the client table after a view change if it was modified by prepared ops that did not survive. This can then cause some client requests to be permanently blocked out, treated as duplicates, while they were never actually committed to the client table.

A cleaner approach might be to use the client table only for a single purpose i.e. only for committed data, and then to use the inflight pipeline to dedupe any uncommitted inflight ops. This way the client table never needs to be patched up after a view change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions