Verify generated tasks through sandboxed execution

Automatically test that generated benchmark tasks are actually solvable in the current sandbox environment before adding them to a suite.

Relates to #70