Skip to content

feat: update relationship schema#2661

Merged
omkar-ethz merged 5 commits intomasterfrom
relationship_schema_update
Apr 9, 2026
Merged

feat: update relationship schema#2661
omkar-ethz merged 5 commits intomasterfrom
relationship_schema_update

Conversation

@omkar-ethz
Copy link
Copy Markdown
Member

@omkar-ethz omkar-ethz commented Apr 8, 2026

Description

Add fields to relationship schema used by relationships field in the dataset schema.

Testing:
PATCH /api/v4/datasets/:pid

{
  "relationships": [
    {
      "identifier": "https://scilog.development.psi.ch/logbooks/6895bea625f055bca783dfdd",
      "identifierType": "URL",
      "entityType": "Logbook",
      "externalId": "6895bea625f055bca783dfdd"
    },
    {
      "identifier": "10.1016/j.epsl.2011.11.037",
      "identifierType": "DOI",
      "entityType": "JournalArticle"
    }
  ]
}

Updated dataset contains:

"relationships": [
  {
    "identifier": "https://scilog.development.psi.ch/logbooks/6895bea625f055bca783dfdd",
    "identifierType": "URL",
    "relationship": "IsReferencedBy",
    "entityType": "Logbook",
    "externalId": "6895bea625f055bca783dfdd",
    "_id": "69d7b7e797c6654f48cab9df"
  },
  {
    "identifier": "10.1016/j.epsl.2011.11.037",
    "identifierType": "DOI",
    "relationship": "IsReferencedBy",
    "entityType": "JournalArticle",
    "_id": "69d7b7e797c6654f48cab9e0"
  }
]

A corresponding frontend feature (widget / tab on dataset detail page) will follow.

Motivation

Previously, RelationshipClass was dataset specific. However, there is a need to link to related entities outside of SciCat, e.g. a SciLog logbook.
So we add relatedEntityType (inspired by datacite's resourceTypeGeneral) . The existing relationship field is roughly relationType

Fixes

  • NA

Changes:

  • generalize description of pid and relationship fields from "dataset" to "entity"
  • add additional fields to the schema as indicated above

Tests included

  • Included for each change/fix?
  • Passing?

Documentation

  • swagger documentation updated (required for API changes)
  • official documentation updated

official documentation info

@omkar-ethz omkar-ethz marked this pull request as ready for review April 8, 2026 16:05
@omkar-ethz omkar-ethz requested a review from a team as a code owner April 8, 2026 16:05
@omkar-ethz omkar-ethz requested a review from emigun April 9, 2026 07:53
Copy link
Copy Markdown
Member

@minottic minottic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

I am wondering if it's relatedIdentifierType might be valuable? Apparently, e.g. for DOIs, that's useful later for datacite to count citations and might become valuable when we publish the dataset?

@omkar-ethz
Copy link
Copy Markdown
Member Author

omkar-ethz commented Apr 9, 2026

LGTM, thanks!

I am wondering if it's relatedIdentifierType might be valuable? Apparently, e.g. for DOIs, that's useful later for datacite to count citations and might become valuable when we publish the dataset?

Thanks for the review! I think this is a good idea.
Currently, the assumption was that relatedIdentifierType is always URL, so we had the url field in the model.

But we can indeed align it more closely with DataCite. So we can remove the url field, as @HayenNico suggests, rename pid to relatedIdentifer. This would contain the URL if relatedIdentifierType is URL, a DOI if relatedIdentifierType is doi and so on.

We could add an optional externalId field that stores the id of the external entity (e.g. logbookId), and have an index on both externalId and relatedIdentifier?

"relationships": [
  {
    "relationship": "IsReferencedBy",
    "relatedEntityType": "Logbook",
    "externalId": "6895bea625f055bca783dfdd",  // id of the related entity in the external system, opaque for SciCat
    "relatedIdentifier": "https://scilog.development.psi.ch/logbooks/6895bea625f055bca783dfdd",
    "relatedIdentifierType": "URL" 
  }
]

Question: As we are renaming the fields, do we need a migration script? Or we can do without, as it seems the class wasn't used before

@nitrosx
Copy link
Copy Markdown
Member

nitrosx commented Apr 9, 2026

It looks good to me.
Do we want to use the DataCite naming which is long?
Also does this cover if I want to express an arbitrary relationship between two different datasets within the same SciCat instance? If so, how would we do that? Do you already have something in mind?

@omkar-ethz
Copy link
Copy Markdown
Member Author

omkar-ethz commented Apr 9, 2026

It looks good to me. Do we want to use the DataCite naming which is long? Also does this cover if I want to express an arbitrary relationship between two different datasets within the same SciCat instance? If so, how would we do that? Do you already have something in mind?

Thanks a lot! Yes the DataCite names were quite verbose, I dropped the related- prefix from the field names.

For SciCat entities, I have proposed using identifierType: Local, and then if entityType is Dataset we can interpret the identifier as the PID of a SciCat dataset. Similarly we can use identifierType: Local, with entityType of Proposal / Instrument to link to other local entities.

So we could express a derived dataset relationship as:

{
    "identifier": "PID.SAMPLE.PREFIX/psi_ds3",
    "identifierType": "Local",
    "relationship": "IsDerivedFrom",
    "entityType": "Dataset"
}

I added this in the documentation of the identifierType field for now, but we can also enforce this in the validation in the future, or use this info in frontend to treat local identifiers specially if needed

@omkar-ethz omkar-ethz merged commit fb15a66 into master Apr 9, 2026
16 checks passed
@omkar-ethz omkar-ethz deleted the relationship_schema_update branch April 9, 2026 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants