* nsd: block on removal of services
This PR uses a WaitGroup to ensure workload removals are complete
before returning from ServiceRegistrationHandler.RemoveWorkload of
the nomad service provider. The de-registration of individual services
still occurs asynchrously, but we must block on the parent removal
call so that we do not race with further operations on the same set
of services - e.g. in the case of a task restart where we de-register
and then re-register the services in quick succession.
Fixes#15032
* nsd: add e2e test for initial failing check and restart
The NSD checks tests were racey, whereby the check may not have
been triggered by the time it was queried. This change wraps the
check so it can account for this.
This removes the current ACL expiration GC section in order to get
the tests passing and allow more time to investigate the test. I
have full confidence the feature is working as expected and have
tested extensively locally.
Some tests may chose to deregister jobs to check Nomad cleanup
logic, however, it is still possible for the test to fail and exit
before this is hit. This therefore adds a cancellable cleanup func
which can be deferred, using context to control whether it gets
run or not.