Support for running transform scripts during result updates.#6196
Support for running transform scripts during result updates.#6196labkey-klum merged 9 commits intodevelopfrom
Conversation
| Container container, | ||
| User user, | ||
| List<Map<String, Object>> rows, | ||
| List<Map<String, Object>> oldKeys |
There was a problem hiding this comment.
minor: looks like this param is unused and can be removed
There was a problem hiding this comment.
good catch, done.
| { | ||
| List<Map<String, Object>> rowsForTransform = resolveRows(container, user, rows); | ||
| AssayTransformContext context = new AssayTransformContext(container, user, rowsForTransform, _schema.getProtocol(), _schema.getProvider()); | ||
| TransformResult result = DataTransformService.get().transformAndValidate(context, null, DataTransformService.TransformOperation.UPDATE); |
There was a problem hiding this comment.
I'm surprised that the run 2nd param isn't needed here, but maybe it is just for writing the transform run properties (which aren't supported for update).
There was a problem hiding this comment.
Yes, lucky for us it isn't. On the way out, it's to create some of the run properties info. Coming back in, we use it to create a protocol application so that in the experiment run graph we see the transform as an additional step (which is very cool). Unfortunately on update, there is no lineage graph that we can tie the operation to so it's not apparent that a transform has been run. I think this is okay for now but something we may want to consider in the future. I think having some derivation protocol would be the wrong way to go here but maybe an audit record?
| if (dataTypeHandled) | ||
| throw new BatchValidationException(new ValidationException(String.format("There was more than one transformed file found for the data type : %s.", context.getProvider().getDataType()))); | ||
| dataTypeHandled = true; |
There was a problem hiding this comment.
what is an example of when this exception would be hit? The ExpData data object here is the specific run data row right?
There was a problem hiding this comment.
would this be if for some reason the update transform script wrote out the same row primary key value to multiple rows?
There was a problem hiding this comment.
I can't think of a direct way a script could intentionally do this but if we made a mistake in the processing of script outputs, it would mean we had multiple files that were generated for the result data (and we wouldn't have a good way to pick the correct one), so this feels like it is an unrecoverable state.
Rationale
This PR includes two main changes:
Related Pull Requests
Changes
AssayResultUpdateServicewhich is where updates happen. If the transform script has altered any rows, those will be merged with the original set of results.AbstractAssayTsvDataHandler.insertRowDatawhich would be after any transform scripts may have been run.DataTransformService 94:95.transformOperationso that a script author can conditionally add logic for different transform operations:Run info properties for Update
During update, a
runProperties.tsvfile is still written out for the transform script to parse. While no new properties have been introduced, the properties that are available during update are a subset of the properties available during insert. The list of run properties during update include (see the spec for more details):