Skip to content

Stop flooding the scheduled executor with no-op tasks for change count updates#486

Open
tjwatson wants to merge 2 commits intoapache:masterfrom
tjwatson:felixLogParams
Open

Stop flooding the scheduled executor with no-op tasks for change count updates#486
tjwatson wants to merge 2 commits intoapache:masterfrom
tjwatson:felixLogParams

Conversation

@tjwatson
Copy link
Member

@tjwatson tjwatson commented Mar 12, 2026

This PR has two commits

  1. Update various debug statements to use message parameters instead of eager String concatenation.
  2. Updates to the task for setting the change count service property.

On startup SCR will blast change count updates for each
component it satisfies and activates etc.

This resulted in a very large number of task being submitted
to the scheduled executor.  Prior to using an executor a Timer
was used.  In both cases the tasks would wait a default of 5
seconds before updating the change count service property.

Every task submitted would be a no-op except the very last one
which had the "final" change count value.  This behavior is
to avoid flooding the system with service modified events.

The issue is that now we are flooding the scheduled executor
with a significant number of task that most all do nothing.
Since moving to an executor we noticed a non-trivial bump in
our CPU usage when the default 5 seconds passes to run all
the queued tasks.

It turns out that on the JVM we are using the Timer is actually
more efficient than the scheduled executor for popping off
all the tasks and running them when the delay timeout is hit.

The overall design here is sub-optimal regardless.  Flooding
a queue with all but one task that do nothing is not efficient.
This change moves to using a simple repeating task that just
updates the change count service property, if needed, every
delay period (defaults to every 5 seconds).
// was conducted, is quite important for the client's own debugging
logger.log(Level.DEBUG,
"Locating method " + getMethodName() + " within class " + theClass.getName(), null );
"Locating method {0} within class {1}", null, getMethodName(), theClass.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that with null as parem not one param to much?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, its not slf4j

{
this.registration = reg;
long delay = m_configuration.serviceChangecountTimeout();
m_componentActor.scheduleWithFixedDelay(new UpdateChangeCountProperty(reg), delay, delay, TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change though, the UpdateChangeCountProperty Runnable will be continuously scheduled, rather than the current approach that only schedules the task if there is a change. Especially with a low serviceChangecountTimeout value I expect that this change's continuous nature will introduce a lot of unnecessary processing, even when no changes need to be announced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to have a small timeout? I wanted performance tests with the original change. I only got to doing that now and the current approach is expensive. It adds over 5% CPU time to initial startup. It would be nice ro see how expensive it really is to check the value of a long once every 5 seconds.

We could consider an enhancement that cancels the repeating task after a set amount of retries. Then reschedules as needed. But we should not prematurely optimize with out performance numbers that prove a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my use-case I have a monitoring function that uses the update to the changecount property as a trigger to inspect the state of the configurations within SCR to see if any of them changed state. A low timeout helps to have this trigger each my monitoring function in a timely manner. Having a large(r) timeout still eventually reaches my monitoring function, but I have seen it misses state changes in case a stream of changes continue to "invalidate" the property update, and thus not reaching my monitoring function.

The previous implementation, which relied on creating and cancelling Timers, was quite expensive with a low timeout value, as it would create a lot of short-lived Threads during startup. I offered a change to move this as task to the already existing ExecutorService (#419) which avoids the creation of threads during startup. What still happens as you rightfully pointed out is that each changecount update results in a task, and that task being scheduled for execution and executing, and the evaluation if the task was invalidated happens within the task's execution.

I think your suggestion of cancelling the task when a new one is about to be scheduled can reduce the unnecessary execution of unneeded tasks. Looking at the original approach with Timers, it appears to me that also in that case each changecount update would be scheduled on the timer, similar to what is happening right now. Though not as efficient as it perhaps can be, it looks to me similar when a Timer was still being used, but as you report an additional 5% CPU time there is clearly something going on.

I remember you in the review voicing a concern on adding these changecount update on the already existing component actor thread. I am not very familiar with what other types of tasks are executed there, but perhaps you are observing different types of tasks scheduled on the same thread interfering with each other?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjwatson I took the liberty of implementing the suggestion you made on cancelling uncompleted tasks when a new needs to be scheduled (#487). Could you have a look if that approach provides an improvement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants