Skip to content

GitHub Issue 875: Assay Multi-File Transform Import Skips First Data Row#7456

Merged
cnathe merged 4 commits intorelease25.11-SNAPSHOTfrom
25.11_fb_luminexTransform875
Feb 27, 2026
Merged

GitHub Issue 875: Assay Multi-File Transform Import Skips First Data Row#7456
cnathe merged 4 commits intorelease25.11-SNAPSHOTfrom
25.11_fb_luminexTransform875

Conversation

@cnathe
Copy link
Contributor

@cnathe cnathe commented Feb 25, 2026

Rationale

#875 Luminex Multi-File Transform Import Skips First Data Row

When a Luminex assay run is imported that includes multiple files, we merge / concatenate the data from those files together into a single runData tsv file when writing the data out to the assay transform script. This PR fixes an issue where the first row of the 2nd/3rd/etc files was getting skipped.

Related Pull Requests

Changes

  • TsvDataSerializer.exportData() to write first row regardless of if column headers written
  • Experiment module metric (assayRunsWithMultipleInputFiles) for count of the number of Luminex and Standard assay runs that were imported with > 1 data input file

…regardless of if column headers written

- note: not yet fixed to make sure selenium test fails as expected on TC first
@cnathe cnathe self-assigned this Feb 25, 2026
…of the number of Luminex and Standard assay runs that were imported with > 1 data input file
@cnathe cnathe requested a review from labkey-klum February 26, 2026 23:01
assayMetrics.put("assayRunsWithMultipleInputFiles", new SqlSelector(schema, """
SELECT COUNT(*) FROM (
SELECT sourceapplicationid, COUNT(*) AS count FROM exp.data
WHERE name NOT LIKE '%.log' AND name NOT LIKE '%.Rout' AND name NOT LIKE '%.pdf' AND sourceapplicationid IN (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the .Rout extension only apply to transform scripts written in R? Maybe I'm wrong but I thought that whatever files are left after the transform script has completed get added as a data output.

It looks like there is data type information encoded into the exp.data lsid. I haven't looked into it but wondering whether filtering on that might work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, if there is a better way to filter down to just the "data" files that would be great. When I was looking over the set of files (exp.data rows) linked to the runid, I just wanted to make sure we aren't counting the logging info files and other generated files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it a little bit and there is some evidence that we use the AbstractAssayProvider.RELATED_FILE_DATA_TYPE for these additional files that get produced from a transform script. I tried this metric variation:

SELECT COUNT(*) FROM (
         SELECT sourceapplicationid, COUNT(*) AS count FROM exp.data
         WHERE lsid NOT LIKE '%:RelatedFile.%' AND sourceapplicationid IN (
             SELECT rowid FROM exp.protocolapplication
             WHERE lsid LIKE '%:SimpleProtocol.CoreStep' AND (protocollsid LIKE '%:LuminexAssayProtocol.%' OR protocollsid LIKE '%:GeneralAssayProtocol.%')
         )
         GROUP BY sourceapplicationid
     ) x WHERE count > 1

And it produced the same result as your query. You are free to play around with it if you were interested but overall I don't think it is superior to your query. If anything yours could be a more conservative estimate and might pull in some false positives but I think that is more desirable than the other direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that better. I'm switching to using that NOT LIKE RelatedFile instead. Thanks

@cnathe cnathe changed the title GitHub Issue 875: Luminex Multi-File Transform Import Skips First Data Row GitHub Issue 875: Assay Multi-File Transform Import Skips First Data Row Feb 27, 2026
@cnathe cnathe merged commit ebfd690 into release25.11-SNAPSHOT Feb 27, 2026
10 of 11 checks passed
@cnathe cnathe deleted the 25.11_fb_luminexTransform875 branch February 27, 2026 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants