Use uniform sampled softmax and add `<START>` token. by tillahoffmann · Pull Request #4 · tillahoffmann/trecs

tillahoffmann · 2025-08-12T15:57:42Z

Despite the discussion in this article, the uniformly sampled softmax without including the target class in the denominator seems to yield better results.
Adds a special <START> token. Without this token, the first element of each playlist is only included as context but never as a target. Now the <START> token provides context for the first token. This is likely not that informative other than reproducing the marginal distribution of the first track in playlists, but gives us a bit of extra information in the early part of the playlist.
Adds the decoder configuration back in that was accidentally dropped in Move to Python-based experiment configuration. #3. This was the .json file that was deleted.

Copilot

Pull Request Overview

This pull request implements uniform sampled softmax and adds a <START> token to playlist sequences. The changes introduce an alternative loss function that uses uniform sampling for the softmax denominator and modifies the data processing pipeline to prepend a special start token to all playlists.

Implements uniform sampled softmax as an alternative to the existing label-in-denominator approach
Adds <START> token injection to playlist sequences during data loading
Updates experiment configurations to support the new loss function option

Reviewed Changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/trecs/util.py`	Adds new uniform sampling function and renames existing function for clarity
`src/trecs/experiments/decoder_only/_base.py`	Integrates start token injection, loss function selection, and updates token encoding order
`tests/test_util.py`	Updates tests to cover both loss function variants using parametrization
`src/trecs/experiments/decoder_only/decoder_only_uniform_softmax_loss.py`	New experiment configuration using uniform loss function
`tests/test_experiments.py`	Adds comprehensive test for all experiment setups
Other files	Minor configuration updates and dependency changes

src/trecs/util.py

Copilot · 2025-08-12T15:59:11Z

src/trecs/experiments/decoder_only/_base.py

+            LIMIT :context_length
            """,
            {"split": split},
            {"context_length": self.context_length},


The SQL query limit was changed from context_length + 1 to context_length, but since a <START> token is now prepended, the total sequence length becomes context_length + 1. This could cause issues if the downstream processing expects sequences of exactly context_length + 1 tokens.

Suggested change

{"context_length": self.context_length},

LIMIT :context_length_plus1

""",

{"split": split},

{"context_length_plus1": self.context_length + 1},

Copilot · 2025-08-12T15:59:11Z

src/trecs/experiments/decoder_only/_base.py

+                        "<EOP>",
+                        "<UNK>",
+                        *(track_id for (track_id,) in cursor),
+                    ],


The token order in the encoder has been changed to put <START> first, but the special token assignments later in the code assume this order. Consider documenting this token order dependency or making the token retrieval more explicit to avoid future ordering issues.

tillahoffmann added 6 commits August 12, 2025 11:19

Ignore notebook checkpoints.

8a749d0

Update dependencies.

0ca1e34

Add different loss function options.

bf5055c

Add experiment tests and old test config.

788bca8

Add target for running experiments.

4746a00

Add <START> token to training.

0724462

tillahoffmann requested a review from Copilot August 12, 2025 15:57

Copilot AI reviewed Aug 12, 2025

View reviewed changes

tillahoffmann added 2 commits August 12, 2025 14:32

Add config for label in denominator.

53e5ed6

Making workspace configurable.

d94836c

tillahoffmann merged commit b685bc7 into main Aug 12, 2025
1 check passed

tillahoffmann deleted the sampling branch August 12, 2025 18:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use uniform sampled softmax and add `<START>` token.#4

Use uniform sampled softmax and add `<START>` token.#4
tillahoffmann merged 8 commits intomainfrom
sampling

tillahoffmann commented Aug 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Aug 12, 2025

Uh oh!

Copilot AI Aug 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tillahoffmann commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tillahoffmann commented Aug 12, 2025 •

edited

Loading