Making Chat Content Flow with Alfresco

chat to alfresco

Let’s start with an axiom:  In a modern business chat is, as much as email, where business gets done.  Every company I have worked with or for in the past decade has come to increasingly rely on chat to coordinate activities within and across teams.  This is great, as chat provides a convenient, asynchronous way to work together.  It fits nicely in that niche between a phone call and an email in terms of urgency of response and formality.  It is no surprise then, that the uptake of chat tools for business has been high.

I’m a big proponent of using chat for our teams, but using it has uncovered a few challenges.  One of them is managing the content that comes out of chats.  Just about every chat tool has a content sharing facility for sending docs back and forth.  That’s great, but what happens to that content once it is pasted into a chat?  Often that content becomes somewhat ephemeral, maybe viewed when it is dropped into the chat but then forgotten.  What happens when you have segments of a chat that are valuable and should be saved?  If you are using chat as a support channel, for example, that chat content may well form the foundation for knowledge that you want to capture and reuse.  If the chat is related to a deal that sales is working, you might want to capture it as part of your sales process so others know what is happening.

This sort of “knowledge leakage” in chat can be partially solved by the search functions in chat, but that is often limited.  Some tools can only go back a certain amount of time or a certain number of messages with the baked in search functions.  The quality of this search depends on which tool you are using and whether you are using a paid or free version of that tool.  Frustratingly, chat tools do not typically index the content of documents shared via chat.  This means you can probably search for the title of the document or the context in which it was shared, but not the content of the document itself.  Regardless of the quality of search, it only solves part of the problem.  Content in chat may be discoverable, but it isn’t easily shareable or captured in a form that is useful for attaching or including in other processes.  In short, chat content creates chaos.  How can we tame this and make chat a better channel for sharing, capturing, curating and finding real knowledge?

Within our Customer Success team at Alfresco we have done some small research projects into this problem, and have even solved it to a certain extent.  Our first crack at this came in the form of an application that captures certain chats from our teams and saves those chat logs into an Alfresco repository.  This is a great start, as it partially solves one of our problems.  Chat logs are no longer ephemeral, they are captured as documents and saved to Alfresco’s Content Services platform.  From there they are indexed, taggable and linkable, so we can easily share something that came up in the chat with others, in context.  This approach is great for capturing whole chats, but what about saving selected segments, or capturing documents that are attached to chats?

Solving both of these problems is straightforward with Alfresco’s content services platform, a good chat tool with a great API, and a little glue.  For this solution I have set out a few simple goals:

  1. The solution should automatically capture documents added to a chat, and save those documents to Alfresco.
  2. The solution should post a link to the saved document in the chat in which the document originated so that it is easy to find in the content repository.  This also ensures that captured chat logs will have a valid link to the content.  We could ask people to post a doc somewhere and then share a link in the chat, but why do that when we can make it frictionless?
  3. The solution should allow chats or segments of chats to be captured as a text document and saved to Alfresco.
  4. The solution should allow for searching for content without leaving the chat, with search results posted to the chat.

Looking at the goals above, a chat bot seems like a fairly obvious solution.  Chat bots can listen in on a chat channel and act automatically when certain things happen in the chat or can be called to action discretely as needed.  A simple chat bot that speaks Alfresco could meet all of the requirements.  Such a bot would be added to a chat channel and listen for documents being uploaded to the chat.  When that happens the bot can retrieve the document from the chat, upload it to an Alfresco repository, and then post the link back to the chat.  The bot would also need to listen for itself to be called upon to archive part of a chat, at which point it retrieves the specified part of the chat from the provider’s API, saves it to Alfresco and posts a link back to the chat.  Finally, the bot would need to listen for itself to be invoked to perform a search, fetch the search terms from the chat, execute a search in Alfresco and post the formatted results back to the chat channel.  This gives us the tools we need to make content and knowledge capture possible in chat without putting a bunch of extra work on the plate of the chat users.

I’ve put together a little proof of concept for this idea, and released it as an open source project on Github (side note, thank you Andreas Steffan for Dockerizing it!).  It’s implemented as a chatbot for Slack that uses the BotKit framework for the interaction bits.  It’s a simplistic implementation, but it gets the point across.  Upon startup the bot connects to your slack and can be added to a chat just like any other bot / user.  Once it is there, it will listen for its name and for file upload events, responding more or less as described above.  Check out the readme for more info.

I’d love to see this improve a bit and get better.  A few things that I’d like to do right away:

  1. Rework the bot to use the Alfresco Unified Javascript API instead of CMIS and direct REST API calls.
  2. Get smarter about the way content gets stored in Alfresco.  Right now it’s just a little logic that builds a folder structure by channel, this could be better.
  3. Improve the way metadata is handled for saved documents / chats so they are easier to find in the future.  Maybe a field that stores chat participant names?
  4. Some NLP smarts.  Perhaps run a chat excerpt through AWS Comprehend and tag the chat excerpt with the extracted topics?
  5. Workflow integration.  I’d love to see the ability to post a document to the chat and request a review which then triggers an Alfresco Process Services process.

If you want to work on this together, pull requests are always welcome!

 

A Simple Pattern for Alfresco Extensions

Over the years I have worked with and for Alfresco, I have written a ton of Alfresco extensions.  Some of these are for customers, some are for my own education, some for R&D spikes, etc.  I’d like to share a common pattern that comes in handy.  If you are a super experienced Alfresco developer, this article probably isn’t for you.  You know this stuff already!

There are a lot of ways to build Alfresco extensions, and a lot of ways to integrate your own code or connect Alfresco to another product.  There are also a lot of ways you might want to call your own code or an integration, whether that is from an Action, a Behavior, a Web Script, a scheduled job, or via the Alfresco Javascript API.  One way to make your extension as flexible as possible is to use what could informally be called the “Service Action Pattern”.

The Service Action Pattern

service_action_pattern_sequence (1)

Let’s start by describing the Service Action Pattern.  In this pattern, we take the functionality that we want to make available to Alfresco and we wrap it in a service object.  This is a well established pattern in the Alfresco world, used extensively in Alfresco’s own public API.  Things like the NodeService, ActionService, ContentService, etc all take core functionality found in the Alfresco platform and wrap it in a well defined service interface consisting of a set of public APIs that return Alfresco objects like NodeRefs, Actions, Paths, etc, or Java primitives.  The service object is where all of our custom logic lives, and it provides a well defined interface for other objects to use.  In many ways the service object serves as a sort of adapter pattern in that we are using the service object to translate back and forth between the types of domain specific objects that your extension requires and Alfresco objects.  When designing a new service in Alfresco, I find it is a best practice to limit the types of objects returned by the service layer to those things that Alfresco natively understands.  If your service object method creates a new node, return a NodeRef, for example.

A custom service object on its own isn’t terribly useful, since Alfresco doesn’t know what to do with it.  This is where an Alfresco Action comes in handy.  We can use one or more Alfresco Actions to call the services that our service object exposes.  Creating an action to call the service object has several advantages.  First, once you have an Action you can easily call that Action (and thus the underlying service object) from the Javascript API (more on this in a moment).  Second, it is easy to take an Action and surface it in Alfresco Share for testing or so your users can call it directly.  Actions can also be triggered by folder rules, which can be useful if you need to call some code when a document is created or updated.  Finally, Actions are registered with Alfresco, which makes them easy to find and call from other Java or server side Javascript code via the ActionService.  If you want to do something to a file or folder in Alfresco there is a pretty good chance that an Action is the right approach.

Using the Service Action Pattern also makes it simple to expose your service object as a REST API.  Remember that Alfresco Actions can be located and called easily from the Javascript API.  The Javascript API also happens to be (IMHO) the simplest way to build a new Alfresco Web Script.  If you need to call your Action from another system (a very common requirement) you can simply create a web script that exposes your action as a URL and call away.  This does require a bit of boilerplate code to grab request parameters and pass them to the Action, which in turn will call your service object.  It isn’t too much and there are lots of great examples in the Alfresco documentation and out in the community.

So why not just bake the code into the Action itself?  Good question!  First, any project of some complexity is likely to have a group of related functionality.  A good example can be found in the AWS Glacier Archive for Alfresco project we built a couple years ago at an Alfresco hack-a-thon.  This project required us to have Actions for archiving content, initiating a retrieval, and retrieving content.  All of these Actions are logically and functionally related, so it makes sense to group them together in a single service.  If you want the details of how Alfresco integrates with AWS Glacier, you just have to look at the service implementation class, the Action classes themselves are just sanity checks and wiring.  Another good reason to put your logic into a service class is for reuse outside of Actions.  Actions carry some overhead, and depending on how you plan to use it you may want to make your logic available directly to a behavior or expose it to the Alfresco Javascript API via a root scope object.  Both of these are straightforward if you have a well defined service object.

I hope this helps you build your next awesome Alfresco platform extension, I have found it a useful way to implement and organize my Alfresco projects.

Importing Legacy CSV Data into Elasticsearch

CSV to ES

I use Salesforce at work quite a bit, and one of the things I find endlessly frustrating about it is the lack of good reporting functions.  Often I end up just dumping all the data I need into a CSV file and opening it up in Excel to build the reports I need.  Recently I was trying to run some analysis on cases and case events over time.  As usual, Salesforce “reports” (which are really little more than filtered lists with some limited predefined object joins) were falling well short of what I needed.  I’ve been playing around with Elasticsearch for some other purposes, such as graphing and analyzing air quality measurements taken over time.  I’ve also seen people on my team at Alfresco use Elasticsearch for some content analytics work with Logstash.  Elasticsearch and Kibana lends itself well to analyzing the kind of time-series data that I was working with in Salesforce.

The Data

Salesforce reporting makes it simple to export your data as a CSV file.  It’s a bit “lowest common denominator”, but it will work for my purposes.  What I’m dumping into that CSV is a series of support case related events, such as state transitions, comments, etc.  What I want out of it is the ability to slice and dice that data in a number of ways to analyze how customers and support reps are interacting with each other.  How to get that CSV data into Elasticsearch?  Logstash.  Logstash has a filter plugin that simplifies sucking CSV data into the index (because of course it does).

The Configuration

Importing CSV data to Elasticsearch using Logstash is pretty straightforward.  To do this, we need a configuration file for Logstash that defines where the input data comes from, how to filter it, and where to send it.  The example below is a version of the config file that I used.  It assumes that the output will be an Elasticsearch instance running locally on port 9200, and will stash the data in an index named “supportdata”.  It will also output the data to stdout for debugging purposes.  Not recommended for production if you have a huge volume, but for my case it’s handy to see.  The filter section contains the list of columns that will be imported.  Using filter options you can get some fine grained control over this behavior.

input {
file {
path =>     [“/path/to/my/file.csv”]
start_position => “beginning”
}
}

filter {
csv {
columns => [
“EventDateTime”,
“User”,
“ElapsedTime”,
“CustomerName”,
“…”,
“…”,
]
}
}

output {
elasticsearch {
hosts => “http://localhost:9200”
index => “supportdata”
}
stdout{}
}

Debugging

No project goes perfectly right the first time, and this was no exception.  I use a Mac for work, and when I first tried to get Logstash to import the data it would run, but nothing would show up in the index.  I turned on debugging to see what was happening, and saw the following output:

[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.pipeline ] Pushing flush onto pipeline

This block just repeated over and over again.  So what’s going wrong?  Obviously Logstash can see the file and can read it.  It is properly picking up the fact that the file has changed, but it isn’t picking up the CSV entries and moving them into Elasticsearch.  Turns out that Logstash is sensitive to line ending characters.  Simply opening the CSV in TextWrangler and saving it with Unix line endings fixed the problem.

Now that I can easily get my CSV formatted event data into Elasticsearch, the next step is to automate all of this so that I can just run my analysis without having to deal with manually exporting the report.  It looks like this is possible via the Salesforce Reports and Dashboards REST API.  I’m just getting my head around this particular Salesforce API, and at first glance it looks like there is a better way to do this than with CSV data.  I’m also looking into TreasureData as an option, since it appears to support pulling data from Salesforce and pushing it into Elasticsearch.  As that work progresses I’ll be sure to share whatever shakes out of it!