A Simple Pattern for Alfresco Extensions

Over the years I have worked with and for Alfresco, I have written a ton of Alfresco extensions.  Some of these are for customers, some are for my own education, some for R&D spikes, etc.  I’d like to share a common pattern that comes in handy.  If you are a super experienced Alfresco developer, this article probably isn’t for you.  You know this stuff already!

There are a lot of ways to build Alfresco extensions, and a lot of ways to integrate your own code or connect Alfresco to another product.  There are also a lot of ways you might want to call your own code or an integration, whether that is from an Action, a Behavior, a Web Script, a scheduled job, or via the Alfresco Javascript API.  One way to make your extension as flexible as possible is to use what could informally be called the “Service Action Pattern”.

The Service Action Pattern

service_action_pattern_sequence (1)

Let’s start by describing the Service Action Pattern.  In this pattern, we take the functionality that we want to make available to Alfresco and we wrap it in a service object.  This is a well established pattern in the Alfresco world, used extensively in Alfresco’s own public API.  Things like the NodeService, ActionService, ContentService, etc all take core functionality found in the Alfresco platform and wrap it in a well defined service interface consisting of a set of public APIs that return Alfresco objects like NodeRefs, Actions, Paths, etc, or Java primitives.  The service object is where all of our custom logic lives, and it provides a well defined interface for other objects to use.  In many ways the service object serves as a sort of adapter pattern in that we are using the service object to translate back and forth between the types of domain specific objects that your extension requires and Alfresco objects.  When designing a new service in Alfresco, I find it is a best practice to limit the types of objects returned by the service layer to those things that Alfresco natively understands.  If your service object method creates a new node, return a NodeRef, for example.

A custom service object on its own isn’t terribly useful, since Alfresco doesn’t know what to do with it.  This is where an Alfresco Action comes in handy.  We can use one or more Alfresco Actions to call the services that our service object exposes.  Creating an action to call the service object has several advantages.  First, once you have an Action you can easily call that Action (and thus the underlying service object) from the Javascript API (more on this in a moment).  Second, it is easy to take an Action and surface it in Alfresco Share for testing or so your users can call it directly.  Actions can also be triggered by folder rules, which can be useful if you need to call some code when a document is created or updated.  Finally, Actions are registered with Alfresco, which makes them easy to find and call from other Java or server side Javascript code via the ActionService.  If you want to do something to a file or folder in Alfresco there is a pretty good chance that an Action is the right approach.

Using the Service Action Pattern also makes it simple to expose your service object as a REST API.  Remember that Alfresco Actions can be located and called easily from the Javascript API.  The Javascript API also happens to be (IMHO) the simplest way to build a new Alfresco Web Script.  If you need to call your Action from another system (a very common requirement) you can simply create a web script that exposes your action as a URL and call away.  This does require a bit of boilerplate code to grab request parameters and pass them to the Action, which in turn will call your service object.  It isn’t too much and there are lots of great examples in the Alfresco documentation and out in the community.

So why not just bake the code into the Action itself?  Good question!  First, any project of some complexity is likely to have a group of related functionality.  A good example can be found in the AWS Glacier Archive for Alfresco project we built a couple years ago at an Alfresco hack-a-thon.  This project required us to have Actions for archiving content, initiating a retrieval, and retrieving content.  All of these Actions are logically and functionally related, so it makes sense to group them together in a single service.  If you want the details of how Alfresco integrates with AWS Glacier, you just have to look at the service implementation class, the Action classes themselves are just sanity checks and wiring.  Another good reason to put your logic into a service class is for reuse outside of Actions.  Actions carry some overhead, and depending on how you plan to use it you may want to make your logic available directly to a behavior or expose it to the Alfresco Javascript API via a root scope object.  Both of these are straightforward if you have a well defined service object.

I hope this helps you build your next awesome Alfresco platform extension, I have found it a useful way to implement and organize my Alfresco projects.

Importing Legacy CSV Data into Elasticsearch

CSV to ES

I use Salesforce at work quite a bit, and one of the things I find endlessly frustrating about it is the lack of good reporting functions.  Often I end up just dumping all the data I need into a CSV file and opening it up in Excel to build the reports I need.  Recently I was trying to run some analysis on cases and case events over time.  As usual, Salesforce “reports” (which are really little more than filtered lists with some limited predefined object joins) were falling well short of what I needed.  I’ve been playing around with Elasticsearch for some other purposes, such as graphing and analyzing air quality measurements taken over time.  I’ve also seen people on my team at Alfresco use Elasticsearch for some content analytics work with Logstash.  Elasticsearch and Kibana lends itself well to analyzing the kind of time-series data that I was working with in Salesforce.

The Data

Salesforce reporting makes it simple to export your data as a CSV file.  It’s a bit “lowest common denominator”, but it will work for my purposes.  What I’m dumping into that CSV is a series of support case related events, such as state transitions, comments, etc.  What I want out of it is the ability to slice and dice that data in a number of ways to analyze how customers and support reps are interacting with each other.  How to get that CSV data into Elasticsearch?  Logstash.  Logstash has a filter plugin that simplifies sucking CSV data into the index (because of course it does).

The Configuration

Importing CSV data to Elasticsearch using Logstash is pretty straightforward.  To do this, we need a configuration file for Logstash that defines where the input data comes from, how to filter it, and where to send it.  The example below is a version of the config file that I used.  It assumes that the output will be an Elasticsearch instance running locally on port 9200, and will stash the data in an index named “supportdata”.  It will also output the data to stdout for debugging purposes.  Not recommended for production if you have a huge volume, but for my case it’s handy to see.  The filter section contains the list of columns that will be imported.  Using filter options you can get some fine grained control over this behavior.

input {
file {
path =>     [“/path/to/my/file.csv”]
start_position => “beginning”
}
}

filter {
csv {
columns => [
“EventDateTime”,
“User”,
“ElapsedTime”,
“CustomerName”,
“…”,
“…”,
]
}
}

output {
elasticsearch {
hosts => “http://localhost:9200”
index => “supportdata”
}
stdout{}
}

Debugging

No project goes perfectly right the first time, and this was no exception.  I use a Mac for work, and when I first tried to get Logstash to import the data it would run, but nothing would show up in the index.  I turned on debugging to see what was happening, and saw the following output:

[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.pipeline ] Pushing flush onto pipeline

This block just repeated over and over again.  So what’s going wrong?  Obviously Logstash can see the file and can read it.  It is properly picking up the fact that the file has changed, but it isn’t picking up the CSV entries and moving them into Elasticsearch.  Turns out that Logstash is sensitive to line ending characters.  Simply opening the CSV in TextWrangler and saving it with Unix line endings fixed the problem.

Now that I can easily get my CSV formatted event data into Elasticsearch, the next step is to automate all of this so that I can just run my analysis without having to deal with manually exporting the report.  It looks like this is possible via the Salesforce Reports and Dashboards REST API.  I’m just getting my head around this particular Salesforce API, and at first glance it looks like there is a better way to do this than with CSV data.  I’m also looking into TreasureData as an option, since it appears to support pulling data from Salesforce and pushing it into Elasticsearch.  As that work progresses I’ll be sure to share whatever shakes out of it!