One of the more frequently requested features in Alfresco is the ability to set a disk quota for either an Alfresco Share site or a specific folder. At our recent company hackathon in Atlanta, some folks on Alfresco's consulting team (including yours truly) decided to tackle quotas. Adding folder quotas is fairly straightforward, we just need to calculate the size of a folder, track size changes and halt an upload if it goes over. Once folder quotas are working, adding a quota to an Alfresco Share Site is a snap, since Sites are just a specialized type of folder. Making it scale well enough to perform on a very large Alfresco instance is a bit trickier! Our implementation of folder / site quotas uses a new content model (including a new aspect), content policies and custom behaviors, an asynchronous queue, a couple scheduled jobs and a transaction listener. The source code and installable AMP packages are available on Google Code, released under an Apache license.
Content Model and New Aspects
The first thing that is required to implement folder quotas is two new properties to be applied to the folder with the quota applied. In order to manage a folder's size, we need to know the current size, and the maximum allowed size. Our implementation uses an aspect to attach these two new properties to a folder (or any of its subtypes, including Site). This aspect is defined in a content model XML file that is loaded at bootstrap using the standard Alfresco dictionaryModelBootstrap. The model defines two new properties, fq:sizeCurrent and fq:sizeQuota.
Folder Quota Behavior and Content Policies
Once we have a place to store the current folder size and the quota size, we need to find a way to keep the current size up to date. We considered a few options before settling on content policies. Content policies make it possible to bind new behaviors to content lifecycle events within Alfresco. For example, a new behavior can be bound to the beforeNodeDelete method, which allows for custom code to be executed immediately before a node is deleted. There are four content policies that have been implemented to make folder quotas work:
ContentServicePolicies.OnContentPropertyUpdate
Behaviors bound to ContentServicePolicies.OnContentPropertyUpdate are called when a node's content property is updated. What this means in practice is that behaviors bound to this policy will be called when the content of a node is updated, but not when its metadata properties are updated. This is used by the folder quota code to increment or decrement a folder's current size when the content of a node is altered.
NodeServicePolicies.BeforeDeleteNodePolicy
Behaviors bound to NodeServicePolicies.BeforeDeleteNodePolicy are called immediately before a node is deleted. However, if the behavior does not specify the right notification frequency, the node might already be gone before this policy is called. To ensure this is not the case, Behavior.NotificationFrequency.FIRST_EVENT is used. This ensures that the bound behavior will be called while the node still exists so that size can be retrieved and subtracted from the folder size.
NodeServicePolicies.OnMoveNodePolicy
Behaviors bound to NodeServicePolicies.OnMoveNodePolicy are called when a node is moved from one location to another. In the folder quota module, this behavior is used to decrement the size of the folder that the node is being moved from, and increment the size of the destination folder.
NodeServicePolicies.OnAddAspectPolicy
Behaviors bound to NodeServicePolicies.OnAddAspectPolicy come into play when the fq:quota aspect is applied to a folder. If the extension is configured to do so, adding the aspect will trigger a size calculation on the folder to which the aspect has been applied. This can be an expensive operation, and is one of the areas where further performance optimization needs to happen.
The first thing that happens when any of the behaviors noted above (other than OnAddAspectPolicy) are triggered is to check whether or not the updated node resides within a folder that has a quota applied. This is done by walking up the tree of folders that contain the actioned node looking for a folder with a quota applied. If one is found, then the size change calculation and property updates are done. If the root of the store is reached without finding a folder with a quota applied, no further action is taken. This operation is fast in most cases, but will take more time the deeper the folder structure becomes. The time it takes for this check is linear (O(n)), meaning that it is proportional to the depth of the folder structure that must be checked.
For another view of content policies, how they can be applied and some sample code, take a look at Jared Ottley's blog article on the subject.
Calculating Folder Size and Change Size
A folder in Alfresco doesn't have a size property by default, thus the need to add it via an aspect. Therefore it should come as no surprise that Alfresco also does not track the size of a folder. One of the most expensive operations associated with the folder quota code is the initial calculation of folder size. This is currently done by recursively traversing the filesystem under the folder with the quota applied, totalling up the size of all of the files contained within. For very large directory structures this may take some time. There are currently two ways to perform this initial calculation. The first is enabling "onAddAspect" calculations. This is done by setting "updateUsageOnAddAspect" to "true" in the module-context.xml file. The second way to calculate the initial sizes is to enable the "folderUsageRecalculateTrigger" job in module-context.xml. This allows the expensive folder size calcuations to be scheduled for a time with low load or when the impact on users will be minimal.
Once we have the initial folder size (0 for a new folder, or calculated when the quota aspect is added or on a schedule), we need to keep this value updated. For the events outlined above that have an impact on content size, we calculate the size of the change in-transaction. This is necessary for a number of reasons. Perhaps the most important is enforcement of the quota! If a content add or update pushes a folder over its defined quota, the transaction is rolled back and an error is logged. Another reason to calculate the size in-transaction is how deletes are handled. If we tried to calculate the size change from a delete after the fact, the node would already be gone (along with its properties) and the folder usage would quickly become inaccurate, requiring an expensive recalculation.
So, we have a good rationale for why the calcuation of the change size needs to happen in-transaction. But, what about the property update?
Asynchronous Property Updates: Scheduled Jobs vs. Transaction Listener
When developing this extension one of our goals was to minimize the impact of the module on the time it takes for a content action (add, update, delete) to complete. To achieve this, we wanted to move everything that wasn't essential out of the transaction. As it turns out, a lot needs to be done in-transaction. For the reasons stated above the size change must be calculated before the transaction completes. But, we can update the folder size with the calculated change value asynchronously. The first release of this module has two ways to accomplish the folder size property updates. Our original implementation uses a simple in-memory queue to store the updates. This queue is then processed by a scheduled job which reads the changes in the queue and updates the folder size property accordingly. The upside to this approach is that adding items to the queue is cheap, and the queue can be processed on a variable schedule. The downside to this approach is that it is possible for an item to be added to a quota folder before previous increment operations have completed. In this case a folder could go over quota. However, any subsequent adds / updates to the same folder would be rejected. Think of it as a "soft" quota. The less frequently the job is run, the more likely it is that content could be added before the folder size is updated.
After some testing and looking at code that others have written to do similar actions when nodes are modified, a second method for updating the quota folder size worked its way into the code. This method uses a transaction listener. The transaction listener is bound using AlfrescoTransactionSupport.bindListener. The listener implements one method, afterCommit, which is called after the current transaction commits. This listener in turn spins off a thread that carries out the updates. It works well and doesn't have the limitations of a queue processing job that runs at a fixed interval, although it is still theoretically possible for a new content update to occur before the folder size has been updated from previous content updates. However, this approach is probably a little more computationally expensive.
These two options (along with a few others that are kicking around) are going to get some benchmarking in the near future to see which is faster.
Wiring Everything Together
The various components that make up the folder quota module are wired together with Spring, like most everything in Alfresco. The module-context.xml file defines the beans that make up the application, including the queue (not required if it is not turned on), the scheduled jobs to process the queue and do initial folder size calculations (or recalculations if things get out of sync), a custom behaviour, the thread pool used by the transaction listener, the quota content model and a simple class to perform the size calculations.
Conclusion and Next Steps
The folder quota code as it stands today should work well for small to medium sized Alfresco instances, and also for larger instances as long as care is taken to ensure that the initial size calculations don't get out of hand. However, there is still work to be done to ensure that it will scale appropriately and work in all cases. We also plan to add unit test coverage and run some benchmarks against a repository with quotas enabled to get a better picture of the performance impact. In the meantime, give it a go and please open an issue on the project issue tracker if something goes terribly wrong.