Stop flooding the scheduled executor with no-op tasks for change count updates#486
Stop flooding the scheduled executor with no-op tasks for change count updates#486tjwatson wants to merge 2 commits intoapache:masterfrom
Conversation
On startup SCR will blast change count updates for each component it satisfies and activates etc. This resulted in a very large number of task being submitted to the scheduled executor. Prior to using an executor a Timer was used. In both cases the tasks would wait a default of 5 seconds before updating the change count service property. Every task submitted would be a no-op except the very last one which had the "final" change count value. This behavior is to avoid flooding the system with service modified events. The issue is that now we are flooding the scheduled executor with a significant number of task that most all do nothing. Since moving to an executor we noticed a non-trivial bump in our CPU usage when the default 5 seconds passes to run all the queued tasks. It turns out that on the JVM we are using the Timer is actually more efficient than the scheduled executor for popping off all the tasks and running them when the delay timeout is hit. The overall design here is sub-optimal regardless. Flooding a queue with all but one task that do nothing is not efficient. This change moves to using a simple repeating task that just updates the change count service property, if needed, every delay period (defaults to every 5 seconds).
| // was conducted, is quite important for the client's own debugging | ||
| logger.log(Level.DEBUG, | ||
| "Locating method " + getMethodName() + " within class " + theClass.getName(), null ); | ||
| "Locating method {0} within class {1}", null, getMethodName(), theClass.getName()); |
There was a problem hiding this comment.
is that with null as parem not one param to much?
scr/src/main/java/org/apache/felix/scr/impl/inject/methods/BaseMethod.java
Show resolved
Hide resolved
| { | ||
| this.registration = reg; | ||
| long delay = m_configuration.serviceChangecountTimeout(); | ||
| m_componentActor.scheduleWithFixedDelay(new UpdateChangeCountProperty(reg), delay, delay, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
With this change though, the UpdateChangeCountProperty Runnable will be continuously scheduled, rather than the current approach that only schedules the task if there is a change. Especially with a low serviceChangecountTimeout value I expect that this change's continuous nature will introduce a lot of unnecessary processing, even when no changes need to be announced.
There was a problem hiding this comment.
Is there a reason to have a small timeout? I wanted performance tests with the original change. I only got to doing that now and the current approach is expensive. It adds over 5% CPU time to initial startup. It would be nice ro see how expensive it really is to check the value of a long once every 5 seconds.
We could consider an enhancement that cancels the repeating task after a set amount of retries. Then reschedules as needed. But we should not prematurely optimize with out performance numbers that prove a problem.
There was a problem hiding this comment.
In my use-case I have a monitoring function that uses the update to the changecount property as a trigger to inspect the state of the configurations within SCR to see if any of them changed state. A low timeout helps to have this trigger each my monitoring function in a timely manner. Having a large(r) timeout still eventually reaches my monitoring function, but I have seen it misses state changes in case a stream of changes continue to "invalidate" the property update, and thus not reaching my monitoring function.
The previous implementation, which relied on creating and cancelling Timers, was quite expensive with a low timeout value, as it would create a lot of short-lived Threads during startup. I offered a change to move this as task to the already existing ExecutorService (#419) which avoids the creation of threads during startup. What still happens as you rightfully pointed out is that each changecount update results in a task, and that task being scheduled for execution and executing, and the evaluation if the task was invalidated happens within the task's execution.
I think your suggestion of cancelling the task when a new one is about to be scheduled can reduce the unnecessary execution of unneeded tasks. Looking at the original approach with Timers, it appears to me that also in that case each changecount update would be scheduled on the timer, similar to what is happening right now. Though not as efficient as it perhaps can be, it looks to me similar when a Timer was still being used, but as you report an additional 5% CPU time there is clearly something going on.
I remember you in the review voicing a concern on adding these changecount update on the already existing component actor thread. I am not very familiar with what other types of tasks are executed there, but perhaps you are observing different types of tasks scheduled on the same thread interfering with each other?
This PR has two commits