Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 20 additions & 16 deletions docs/technical_documentation/concepts/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,29 @@ Pipelines

Aspects provide two Pipelines which are detailed below.

Vector Pipeline
###############

As of Aspects version 4.0, the Vector pipeline is the default pipeline (previous
versions defaulted to a Celery / Ralph pipeline). It works by capturing the standard
output from the LMS logs and sending them directly to configured "sinks" or data
destinations. It implements two similar pipelines: one for xAPI data (enabled by default)
and one for tracking logs (disabled by default).

Vector is lighter weight, and generally data will arrive faster.
It can also be a good choice if you want to add other listeners for that data
(ex: to store xAPI statements to S3).

To learn more about Vector, see the `Vector documentation <https://vector.dev/docs/>`_.

To configure Vector as your pipeline, see the :ref:`Quick Start - Vector guide <quick-start-vector>`.


Ralph Pipeline
##############

The Ralph pipeline is the default pipeline, and is the most robust. It will retry the
Prior to version 4.0, the Ralph pipeline was the default. It is now an alternative
pipeline, and remains the most robust . It will retry the
most important failed events, and will catch most duplicates before they hit the database.
This pipeline consist of a plugin in the LMS (`event-routing-backends`) that will send
through HTTP the events to the Ralph API.
Expand All @@ -19,18 +38,3 @@ Ralph is for sharing xAPI data using the LRS standard.
To learn more about Ralph, see the `Ralph documentation <https://openfun.github.io/ralph/>`_.

To configure Ralph as your pipeline, see the :ref:`Quick Start - Ralph guide <quick-start-ralph>`.

Vector Pipeline
###############

The Vector pipeline instead works by capturing the standard output from the LMS logs
and sending them directly to configured "sinks" or data destinations. It implements two
similar pipelines, one for xAPI data and one for tracking logs.

Vector is lighter weight, and generally data will arrive a little faster, but doesn’t retry.
It can also be a good choice if you want to add other listeners for that data
(ex: to store xAPI statements to S3).

To learn more about Vector, see the `Vector documentation <https://vector.dev/docs/>`_.

To configure Vector as your pipeline, see the :ref:`Quick Start - Vector guide <quick-start-vector>`.
5 changes: 4 additions & 1 deletion docs/technical_documentation/concepts/ralph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,13 @@ Although Ralph has usages such as:
- Validate xAPI statements.
- Store events to different backends `backends <https://openfun.github.io/ralph/latest/features/backends/>`_.

In the aspects project, Ralph is optionally used as the API server that connects Open edX
In the aspects project, Ralph is an optional API server that connects Open edX
and Clickhouse database. Ralph receives the xAPI statements from Open edX and stores them
in the Clickhouse database after validating the data.

To use Ralph as your xAPI transport, you must set ``ASPECTS_XAPI_SOURCE: ralph`` and
``RUN_RALPH: True`` in your Tutor configuration.

By default, Ralph is connected to the Open edX platform via Event Routing Backends without any filter
and receives all the xAPI statements. To learn more about event-routing-backends, please
refer to the `documentation <https://event-routing-backends.readthedocs.io/en/latest/>`_.
4 changes: 2 additions & 2 deletions docs/technical_documentation/concepts/vector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Vector
******

Vector is lightweight and ultra-fast tool for building observability pipelines.
In the Aspects project, Vector can optionally be used as a replacement for Ralph to
As of Aspects version 4.0, Vector is the default tool used to
capture xAPI learner statements in the ClickHouse database, and/or as a way to
store raw tracking log statements. It can be used as a general purpose log collector
and forwarder.
Expand Down Expand Up @@ -43,6 +43,6 @@ Those tables are controlled by the variables:

ASPECTS_VECTOR_DATABASE: "openedx"
ASPECTS_VECTOR_RAW_TRACKING_LOGS_TABLE: "_tracking"
ASPECTS_VECTOR_RAW_XAPI_TABLE: "xapi_events_all"
ASPECTS_RAW_XAPI_TABLE: "xapi_events_all"

To learn more about Vector, see the `Vector documentation <https://vector.dev/docs/>`_.
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,29 @@ Aspects can be configured to send xAPI events to ClickHouse in several different

At a high level the options are:

Vector (default)
----------------

**Recommended for:** Most deployments, from resource-constrained Tutor local environments to larger production stacks.

Vector is a log forwarding service that monitors the logs from docker containers or Kubernetes pods. It writes events directly to ClickHouse and automatically batches events based on volume. The LMS is configured to transform and log xAPI events in-process and Vector picks them up by reading the logs.

Pros:

- Removes the need to run or scale Ralph
- Automatic batching adjustments
- Fastest delivery times to ClickHouse
- Vector failures do not impact other systems
- Allows to backup and restore data from an S3 compatible backend

Cons:

- It is a new service for most operators
- Events are not de-duplicated before insert, which can result in some duplicate or incorrect data in a log replay or disaster recovery situation
- Needs a pod run for every LMS or CMS Kubernetes worker
- When run in-process, adds a small amount of overhead to any LMS request that sends an xAPI statement


Celery tasks without batching (default as of 1.0.0)
---------------------------------------------------

Expand Down Expand Up @@ -52,29 +75,6 @@ Cons:
- Batching is not as well tested (as of Redwood) and may have edge cases until it has been used in production


Vector
------

**Recommended for:** Resource-constrained Tutor local environments, experienced operators on larger deployments.

Vector is a log forwarding service that monitors the logs from docker containers or Kubernetes pods. It writes events directly to ClickHouse and automatically batches events based on volume. The LMS can be configured to transform and log xAPI events in-process and Vector will pick them up by reading the logs.

Pros:

- Removes the need to run or scale Ralph
- Automatic batching adjustments
- Fastest delivery times to ClickHouse
- Vector failures do not impact other systems

Cons:

- It is a new service for most operators
- Events are not de-duplicated before insert, which can result in some (mostly temporary) incorrect data in a disaster recovery
- Disaster recovery hasn't been tested with Aspects yet
- Needs a pod run for every LMS or CMS Kubernetes worker
- When run in-process, adds a small amount of overhead to any LMS request that sends an xAPI statement


Event Bus (experimental)
------------------------

Expand Down
12 changes: 9 additions & 3 deletions docs/technical_documentation/quickstarts/ralph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,22 @@ Ralph

Installation instructions for Aspects are available on the plugin site: https://github.com/openedx/tutor-contrib-aspects

Ralph is the default option to send xAPI events to Clickhouse. To run it make sure to enable the `RUN_RALPH` option in the `config.yml` file.
Ralph is an alternative option to send xAPI events to Clickhouse, providing full xAPI
learning record store (LRS) statement support and deduplication (prior to Aspects version
4.0, Ralph was the default). To use Ralph as your xAPI pipeline, you need to enable it
and set it as the source in your `config.yml` file.

.. code-block:: yaml

RUN_RALPH: True
ASPECTS_XAPI_SOURCE: ralph

# We recommend only running Ralph or Vector for performance reasons, so
# suggest turning off Vector here
# We recommend only running one transport for performance reasons, so
# suggest turning off Vector if you are using Ralph for xAPI
RUN_VECTOR: False

When ``ASPECTS_XAPI_SOURCE`` is set to ``ralph``, the xAPI data will be stored in the database defined by ``RALPH_DATABASE`` (defaults to ``xapi``).


Aspects provides the following configuration options:

Expand Down
14 changes: 6 additions & 8 deletions docs/technical_documentation/quickstarts/vector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,16 @@ Vector

Installation instructions for Aspects are available on the plugin site: https://github.com/openedx/tutor-contrib-aspects

Vector is an alternative option to send xAPI events to Clickhouse. It can be run along with Ralph, but to optimize resources we encourage you to only use one.

To configure Vector as the xAPI event handler, you can use the following configuration:
Vector is the default option to send xAPI events to Clickhouse in Aspects. It is enabled by default with the following settings:

.. code-block:: yaml

# Disable ralph
RUN_RALPH: False
# Enable vector
# Default settings
RUN_VECTOR: True
# Change the xAPI database to the one Vector uses
ASPECTS_XAPI_DATABASE: "openedx"
RUN_RALPH: False
ASPECTS_XAPI_SOURCE: vector

When ``ASPECTS_XAPI_SOURCE`` is set to ``vector``, the xAPI data will be stored in the database defined by ``ASPECTS_VECTOR_DATABASE`` (defaults to ``openedx``).


Aspects provides the following configuration options:
Expand Down