Clustering
All clustering related classes are contained within the de.jplag.clustering(.*) packages.
The central idea behind the structure of clustering is the ease of use: To use the clustering calling code should only ever interact with the ClusteringOptions, ClusteringFactory, and ClusteringResult classes:

New clustering algorithms and preprocessors can be implemented using the GenericClusteringAlgorithm and ClusteringPreprocessor interfaces which operate on similarity matrices only. ClusteringAdapter handles the conversion between de.jplag classes and matrices. PreprocessedClusteringAlgorithm adds a preprocessor onto another ClusteringAlgorithm.
Remarks on Spectral Clustering
Integration Tests
There are integration tests for the Spectral Clustering to verify, that a least in the case of two known sets of similarities the groups known to be colluders are found. However, these are considered to be sensitive data. The datasets are not available to the public and these tests can only be run by maintainers with access.
To run these tests the contents of the PseudonymizedReports repository must added in the folder jplag/src/test/resources/de/jplag/PseudonymizedReports.
Clustering
All clustering related classes are contained within the
de.jplag.clustering(.*)packages.The central idea behind the structure of clustering is the ease of use: To use the clustering calling code should only ever interact with the
ClusteringOptions,ClusteringFactory, andClusteringResultclasses:New clustering algorithms and preprocessors can be implemented using the
GenericClusteringAlgorithmandClusteringPreprocessorinterfaces which operate on similarity matrices only.ClusteringAdapterhandles the conversion betweende.jplagclasses and matrices.PreprocessedClusteringAlgorithmadds a preprocessor onto anotherClusteringAlgorithm.Remarks on Spectral Clustering
Integration Tests
There are integration tests for the Spectral Clustering to verify, that a least in the case of two known sets of similarities the groups known to be colluders are found. However, these are considered to be sensitive data. The datasets are not available to the public and these tests can only be run by maintainers with access.
To run these tests the contents of the PseudonymizedReports repository must added in the folder
jplag/src/test/resources/de/jplag/PseudonymizedReports.