Automatically test that generated benchmark tasks are actually solvable in the current sandbox environment before adding them to a suite. Relates to #70