Content Services is in Alfresco’s DNA

I’m spending this week at Alfresco’s Sales Kickoff in Chicago, and having a blast.  There’s a lot of energy across the company about our new Digital Business Platform, and it’s great to see how many people instantly and intuitively get how Alfresco’s platform fits into a customer’s digital transformation strategy.  When content and process converge, and when we provide a world class platform for managing and exposing both as a service it’s a pretty easy case to make.  We have some great customer stories to drive the point home too.  It’s one thing to talk about a digital strategy and how we can play there, but it’s another thing entirely to see it happen.

Content management is undergoing a shift in thinking.  Analysts have declared that ECM is dead, and content services is a better way to describe the market.  For my part, I think they are right.  Companies ahead of the curve have been using Alfresco as a content services platform for a long time.  I decided to do a little digging and see when Alfresco first added a web API to our content platform.  A quick look through some of our internal systems shows that Alfresco had working web services for content all the way back in 2006.  It was probably there earlier than that, but that’s one of the earliest references I could easily find in our systems.  That’s over a decade of delivering open source content services.  Here’s a quick view of the history of content services delivery channels in the product.

API History

I don’t think any other company in the market space formerly known as ECM can say that they have been as consistently service enabled for as long as Alfresco.  It’s great to see the market going to where we have been all along.

Open Source in an AI World. Open Matters More Now Than Ever.


Technological unemployment is about to become a really big problem.  I don’t think the impact of automation on jobs is in any doubt at this point, the remaining questions are mostly around magnitude and timeline.  How many jobs will be affected, and how fast will it happen?  One of the things that worries me the most is the inevitable consolidation of wealth that will come from automation.  When you have workers building a product or providing a service, a portion of the wealth generated by those activities always flows to the people that do the work.  You have to pay your people, provide them benefits, time off, etc.  Automation changes the game, and the people that control the automation are able to keep a much higher percentage of the wealth generated by their business.

When people talk about technological unemployment, they often talk about robots assuming roles that humans used to do.  Robots to build cars, to build houses, to drive trucks, to plant and harvest crops, etc.  This part of the automation equation is huge, but it isn’t the only way that technology is going to make some jobs obsolete.  Just as large (if not larger) are the more ethereal ways that AI will take on larger and more complex jobs that don’t need a physical embodiment.  Both of these things will affect employment, but they differ in one fundamental way:  Barrier to entry.

High barriers

Building robots requires large capital investments for machining, parts, raw materials and other physical things.  Buying robots from a vendor frees you from the barriers of building, but you still need the capital to purchase them as well as an expensive physical facility in which you can deploy them.  They need ongoing physical maintenance, which means staff where the robots are (at least until robots can do maintenance on each other).  You need logistics and supply chain for getting raw materials into your plant and finished goods out.  This means that the financial barrier to entry for starting a business using robots is still quite high.  In many ways this isn’t so different from starting a physical business today.  If you want to start a restaurant you need a building with a kitchen, registers, raw materials, etc.  The difference is that you can make a one time up-front investment in automation in exchange for a lower ongoing cost in staff.  Physical robots are also not terribly elastic.  If you plan to build an automated physical business, you need to provision enough automation to handle your peak loads. This means idle capacity when you aren’t doing enough business to keep your machines busy.  You can’t just cut a machine’s hours and reduce operating costs in the same way you can with people.  There are strategies for dealing with this like there are in human-run facilities, but that’s beyond the scope of this article.

Low barriers

At the other end of the automation spectrum is AI without a physical embodiment.  I’ve been unable to find an agreed upon term for this concept of a “bodiless” AI.  Discorporate AI?  Nonmaterial AI?  The important point is that this category includes automation that isn’t a physical robot.  Whatever you want to call it, a significant amount of technological unemployment will come from this category of automation.  AI that is an expert in a given domain will be able to provide meaningful work delivered through existing channels like the web, mobile devices, voice assistants like Alexa or Google Home, IoT devices, etc.  While you still need somewhere for the AI to run, it can be run on commodity computing resources from any number of cloud providers or on your own hardware.  Because it is simply applied compute capacity, it is easier to scale up or down based on demand, helping to control costs during times of low usage.  Most AI relies on large data sets, which means storage, but storage costs continue to plummet to varying degrees depending on your performance, retrieval time, durability and other requirements.  In short, the barrier to entry for this type of automation is much lower.  It takes a factory and a huge team to build a complete market-ready self driving car.  You can build an AI to analyze data and provide insights in a small domain with a handful of skilled people working remotely.  Generally speaking, the capital investment will be smaller, and thus the barrier to entry is lower.

Open source democratizes AI

I don’t want to leave you with the impression that AI is easy.  It isn’t.  The biggest players in technology have struggled with it for decades.  Many of the hardest problems are yet to be solved.  On the individual level, anybody that has tried Siri, or Google Assistant or Alexa can attest to the fact that while these devices are a huge step forward, they get a LOT wrong.  Siri, for example, was never able to respond correctly when I asked it to play a specific genre of music.  This is a task that a 10 year old human can do with ease.  It still requires a lot of human smarts to build out fairly basic machine intelligence.

Why does open source matter more now than ever?  That was the title of this post, after all, and it’s taking an awfully long time to get to the point.  The short version is that open source AI technologies further lower the barriers to entry for the second category of automation described above.  This is a Good Thing because it means that the wealth created by automation can be spread across more people, not just those that have the capital to build physical robots.  It opens the door for more participation in the AI economy, instead of restricting it to a few companies with deep pockets.

Whoever controls automation controls the future of the economy, and open source puts that control in the hands of more people.

Thankfully, most areas of AI are already heavily colonized by open source technologies.  I’m not going to put together a list here, Google can find you more comprehensive answers.  Machine learning / deep learning, natural language processing, and speech recognition and synthesis all have robust open source tools supporting them.  Most of the foundational technologies underpinning these advancements are also open source.  The mots popular languages for doing AI research are open.  The big data and analytics technologies used for AI are open (mostly).  Even robotics and IoT have open platforms available.  What this means is that the tools for using AI for automation are available to anybody with the right skills to use them and a good idea for how to apply them.  I’m hopeful that this will lead to broad participation in the AI boom, and will help mitigate to a small degree the trend toward wealth consolidation that will come from automation.  It is less a silver bullet, more of a silver lining.

Image Credit: By Johannes Spielhagen, Bamberg, Germany [CC BY-SA 3.0], via Wikimedia Commons

A Brief History of Screwing Up Software


Unless you live in a cave you have heard by now about the recent massive AWS outage and how it kind of broke the Internet for a lot of people.  Amazon posted an account of what went wrong, and the root cause is the sort of thing that makes you cringe.  One typo in one command was all it took to take a huge number of customers and sites offline.  If you have been a software developer or administrator or in some other way have had your hands on important production systems you can’t help but feel some sympathy for the person responsible for the AWS outage.  Leaving aside the wisdom of giving one person the power and responsibility for such a thing, I think we have all lived in fear of that moment.  We’ve all done our fair share of dumb things during our tech careers.  In the interest of commiserating with that poor AWS engineer, here are some of the dumbest things I’ve done during my life in tech:

  1. Added four more layers of duct tape to the “infrastructure” that holds the internet together with several bad routing table choices.
  2. Had my personal site hacked and turned into a spam spewing menace.  Twice.  Pay attention to those Joomla and Drupal security advisories folks, those that would do you harm sure do!
  3. Relied on turning it off and then back on again to fix a deadlock I couldn’t find a root cause for.  Embedded systems watchdog FTW.
  4. Wrote my own implementation of an HTTP server.  I recommend everybody do this at least once just so you can see how good you have it.  Mine ended up being vulnerable to a directory traversal attack.  Thankfully a friend caught it before somebody evil did.
  5. Used VB6 for a real project that ended up serving 100x as many users as it was intended to.  Actually, let’s just expand that to “used VB6”.
  6. Done many “clever” things in my projects that came back to bite me later.  Nothing like writing code and then finding out a year later that you can’t understand what you did.  Protip:  Don’t try to be clever, be clear instead.
  7. Ran a query with a bad join that returned a cartesian product.  On a production database that was already underpowered.  With several million rows in each table.
  8. Ran another query that inadvertently updated every row in a huge table when I only actually needed to update a handful.  Where’s that WHERE clause again?  Backups to the rescue!

Anybody that spends decades monkeying around with servers and code will have their own list just like this one.  If they tell you they don’t they are either too arrogant to realize it or are lying to you.  I’m happy to say that I learned something valuable from each and every example above and it’s made me better at my job.

What’s your most memorable mistake?

Technological Unemployment in Alabama


Much has been written about the effect that technology has on jobs.  While some (myself included) have seen our careers benefit immensely from the march of technological progress, many people have not been so fortunate.  I grew up in the Detroit area from the late 70s to the early 90s and even back then I remember people talking about how many workers were going to lose their jobs to machines.  I was young at the time, but the fear and uncertainty was palpable and was a regular topic of discussion in households connected to the automotive industry.  Companies rejoiced at the prospect of workers that never took a day off and did their jobs consistently, correctly and without complaint.  Human workers had a different view.  The robots were coming, an unstoppable mechanical menace that would decimate employment.

Robots vs. “Robots”

Fast forward to the present day.  In the words of the immortal Yogi Berra, it feels like “deja vu all over again”.  Thought leaders around the world are talking about the effect of automation on jobs, and the scope is orders of magnitude larger than it was before.  The rise of AI, machine learning, natural language processing, big data, cheap sensors and compute power and other technologies has expanded the scope of technological unemployment well past the assembly line.  Even jobs that were considered safe ten years ago are now witnessing at the very least a sea change in how they are done, if not outright extinction at the hands of technology.  Retail jobs face threats from cashier-free stores like Amazon Go.  Truck drivers are watching helplessly as Otto completed their first driverless delivery.  Personal drivers such as chauffeurs, taxi drivers and ride share drivers are nervous about self driving car technologies from Uber, Waymo, Ford, GM and others.  Projects like the open-source Farmbot offer a glimpse into the future of farming where seeding, watering and weeding tasks are carried out completely by machine.  Even previously labor intensive tasks related to harvesting are being automated.  Timber and logging are on the block as well, with robots being deployed to take down trees and handle post-harvest processing.  Construction is facing automation too.  We already have robots that can lay bricks faster than a human worker, and companies around the world are experimenting with 3D printing entire structures.

While the physical manifestations of technology may be the most visible, they are not the only way technology is changing the face of work.  When people hear the word “robot” they usually think of a physical machine.  A Terminator.  An industrial robot that welds car frames.  A Roomba.  A self driving car.  That’s only part of the story.  AI doesn’t need a physical body, it can live on the internet, basically invisible until you interact with it, and still do jobs a human does today.  We are already seeing this in many spaces from healthcare supplemented by IBM Watson, to fully automated AI driven translation services that in the future will handle language tasks that people do today.  In short, this isn’t just a “blue collar” issue, it’s much bigger than that.  Consider what happens when you can just ask Alexa, and don’t need to call a human expert for help with a problem.

Sweet Home Unemployment

What does this have to with Alabama, specifically?  I had not really given much thought to the specific impact on the southeast until I first read about Futureproof Bama.  These folks have a mission “to help the state of Alabama prepare for the transition period for when robotics, artificial intelligence, and other autonomous innovations will make most traditional work obsolete.“.  The point was really driven home for me when I had the chance to sit down and chat with Taylor Phillips at a recent Awesome Foundation Awesome Hour event and talk automation, jobs and the future of work.  Why is this of particular concern to Alabama?  Will we be impacted more than other areas of the country?  Well, let’s take a look.

According to (which calculates their numbers from BLS data) there are about 1.85 million Alabamians employed as of 2015.  3.7% of them are retail cashiers.  2% are freight, stock or material movers.  1.66% are heavy truck drivers.  Light truck, delivery, industrial truck and tractor operators combine to add about another 1.1%.  Tellers and counter clerks are another 1% or so.  Various construction trades are another couple of points, as is agriculture.  I could go on, but you can read the list for yourself.  It isn’t hard to tally up the affected professions on this list and get to 20-30% of the population that is squarely in the crosshairs of automation in the short term.  To put this in perspective, unemployment during The Great Depression peaked at around 25%.  Will Alabama fare worse than other areas?  I don’t know, an extensive analysis of affected employment categories across states would be necessary to answer that question with any kind of certainty.  Regardless of how Alabama stacks up to other states, it’s clear that we face a period of great change in how we view work.

Sounds grim, but you might ask “Can’t these folks just find another job?”.  Maybe, maybe not.  If the entire job category disappears, then no, at least not in their chosen field.  Even if a job doesn’t disappear entirely, a reduction in the number of available positions in a particular industry combined with a steady or increasing number of people that want that job will result in a serious downward pressure on salary.  Good for employers, bad for employees.  At a minimum, we’ll have a huge number of people that will need to retrain into a different job.  Even then, it’s unlikely that we’ll have enough labor demand across industries to soak up the excess supply.  All of this adds up to a very big problem that will require an equally big solution.  We cannot legislate automation out of existence, nor can we simply ignore the people that are left out of work.

If this concerns you, and you want to join a group of people that are working hard to get Alabama ahead of the curve, pop over to Futureproof Bama and get involved.

My Favorite New Things in the Alfresco Digital Business Platform


Everybody inside Alfresco has been busy getting ready for today’s launch of our new version, new branding, new web site, updated services and everything that comes along with it.  Today was a huge day for the company, with the release of Alfresco Content Services 5.2, a shiny new version of Alfresco Governance Services, our desktop sync client, the Alfresco Content Connector for Salesforce, a limited availability release of the Alfresco App Dev Framework and refreshes of other products such as our analytics solution, media management and AWS AMIs / Quickstarts.  Here are a few of my favorite bits from today’s releases (in no particular order).

The new REST API

Alfresco has always had a great web API, both the core REST API that was useful for interacting with Alfresco objects, and the open standards CMIS API for interacting with content.  Alfresco Content Services 5.2 takes this to the next level with a brand new set of APIs for working directly with nodes, versions, renditions and running search queries.  Not only is there a new API, but it is easier than ever to explore what the API has to offer via the API Explorer.  We also host a version of the API explorer so you can take a look without having to set up an Alfresco instance.  The new REST API is versioned, so you can build applications against it without worry that something will change in the future and break your code.  This new REST API was first released in the 5.2 Community version and is now available to Alfresco Enterprise customers.  The API is also a key component of the Alfresco App Development Framework, or ADF.  Like all previous releases, you can still extend the API to suit your needs via web scripts.

Alfresco Search Services

Alfresco Content Services 5.2 comes with a whole new search implementation called Alfresco Search Services.  This service is based on Solr 6, and brings a huge number of search improvements to the Alfresco platform.  Search term highlighting, indexing of multiple versions of a document, category faceting and multi-select facets and document fingerprinting are all now part of the Alfresco platform.  Sharding also gets some improvements and you can now shard your index by DBID, ACL, date, or any string property.  This is a big one for customers supporting multiple large, distinct user communities that may each have different search requirements.  Unlike previous releases of Alfresco, search is no longer bundled as a WAR file.  It is now its own standalone service.

The Alfresco App Dev Framework

Over the years there have been a number of ways to build what your users need on top of the Alfresco platform.  In the early days this was the Alfresco Explorer (now deprecated), built with JSF.  The Share UI was added to the mix later, allowing a more configurable UI with extension points based on Surf and YUI.  Both of these approaches required you to start with a UI that Alfresco created and modify it to suit your needs.  This works well for use cases that are somewhat close to what the OOTB UI was built for, or for problems that require minimal change to solve.  For example, both Explorer and Share made it pretty easy to add custom actions, forms, or to change what metadata was displayed.  However, the further you get from what Share was designed to do, the more difficult the customizations become.

What about those cases where you need something completely different?  What if you want to build your own user experience on top of Alfresco content and processes?  Many customers have done this by building our their own UI in any number of different technologies.  These customers asked us to make it easier, and we listened.  Enter the Alfresco App Dev Framework, or ADF.  The ADF is a set of Angular2 components that make it easier to build your own application on top of Alfresco services.  There’s much more to it than that, including dev tooling, test tooling and other things that accelerate your projects.  The ADF is big enough to really need its own series of articles, so may I suggest you hop over to the Alfresco Community site and take a look!  Note that the ADF is still in a limited availability release, but we have many customers that are already building incredible things with it.

Admin Improvements

A ton of people put in a tremendous amount of work to get Alfresco Content Services 5.2 out the door.  Two new features that I’ve been waiting for are included, courtesy of the Alfresco Community and Alfresco Support.  The first is the trashcan cleaner, which can automate the task of cleaning out the Alfresco deleted items collection.  This is based on the community extension that many of our customers have relied on for years.  The second is the Alfresco Support Tools component.  Support Tools gives you a whole new set of tools to help manage and troubleshoot your Alfresco deployment, including thread dumps, profiling and sampling, scheduled job and active session monitoring, and access to both viewing logs and changing log settings, all from the browser.  This is especially handy for those cases where admins might not have shell access to the box on which Alfresco is running or have JMX ports blocked.  There’s more as well, check out the 5.2 release notes for the full story.

The Name Change

Ok, so we changed the name of the product.  Big deal?  Maybe not to some people, but it is to me.  Alfresco One is now Alfresco Content Services.  Why does this matter?  For one, it more accurately reflects what we are, and what we want to be.  Alfresco has a great UI in Share, but it’s pretty narrowly focused on collaboration and records management use cases.  This represents a pretty big slice of the content management world, but it’s not what everybody needs.  Many of our largest and most successful customers use Alfresco primarily as a content services platform.  They already have their own front end applications that are tailor made for their business, either built in-house or bought from a vendor.  These customers need a powerful engine for creating, finding, transforming and managing content, and they have found it in Alfresco.  The name change also signals a shift in mindset at Alfresco.  We’re thinking bigger by thinking smaller.  This new release breaks down the platform into smaller, more manageable pieces.  Search Services, the Share UI, Content Services and Governance Services are all separate components that can be installed or not based on what you need.  This lets you build the platform you want, and lets our engineering teams iterate more quickly on each individual component.  Watch for this trend to continue.

I’m excited to be a part of such a vibrant community and company, and can’t wait to see what our customers, partners and others create with the new tools they have at their disposal.  The technology itself is cool, but what you all do with it is what really matters.

Importing Legacy CSV Data into Elasticsearch


I use Salesforce at work quite a bit, and one of the things I find endlessly frustrating about it is the lack of good reporting functions.  Often I end up just dumping all the data I need into a CSV file and opening it up in Excel to build the reports I need.  Recently I was trying to run some analysis on cases and case events over time.  As usual, Salesforce “reports” (which are really little more than filtered lists with some limited predefined object joins) were falling well short of what I needed.  I’ve been playing around with Elasticsearch for some other purposes, such as graphing and analyzing air quality measurements taken over time.  I’ve also seen people on my team at Alfresco use Elasticsearch for some content analytics work with Logstash.  Elasticsearch and Kibana lends itself well to analyzing the kind of time-series data that I was working with in Salesforce.

The Data

Salesforce reporting makes it simple to export your data as a CSV file.  It’s a bit “lowest common denominator”, but it will work for my purposes.  What I’m dumping into that CSV is a series of support case related events, such as state transitions, comments, etc.  What I want out of it is the ability to slice and dice that data in a number of ways to analyze how customers and support reps are interacting with each other.  How to get that CSV data into Elasticsearch?  Logstash.  Logstash has a filter plugin that simplifies sucking CSV data into the index (because of course it does).

The Configuration

Importing CSV data to Elasticsearch using Logstash is pretty straightforward.  To do this, we need a configuration file for Logstash that defines where the input data comes from, how to filter it, and where to send it.  The example below is a version of the config file that I used.  It assumes that the output will be an Elasticsearch instance running locally on port 9200, and will stash the data in an index named “supportdata”.  It will also output the data to stdout for debugging purposes.  Not recommended for production if you have a huge volume, but for my case it’s handy to see.  The filter section contains the list of columns that will be imported.  Using filter options you can get some fine grained control over this behavior.

input {
file {
path =>     [“/path/to/my/file.csv”]
start_position => “beginning”

filter {
csv {
columns => [

output {
elasticsearch {
hosts => “http://localhost:9200”
index => “supportdata”


No project goes perfectly right the first time, and this was no exception.  I use a Mac for work, and when I first tried to get Logstash to import the data it would run, but nothing would show up in the index.  I turned on debugging to see what was happening, and saw the following output:

[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.inputs.file ] each: file grew:/path/to/my/file.csv: old size 0, new size 4674844
[DEBUG][logstash.pipeline ] Pushing flush onto pipeline

This block just repeated over and over again.  So what’s going wrong?  Obviously Logstash can see the file and can read it.  It is properly picking up the fact that the file has changed, but it isn’t picking up the CSV entries and moving them into Elasticsearch.  Turns out that Logstash is sensitive to line ending characters.  Simply opening the CSV in TextWrangler and saving it with Unix line endings fixed the problem.

Now that I can easily get my CSV formatted event data into Elasticsearch, the next step is to automate all of this so that I can just run my analysis without having to deal with manually exporting the report.  It looks like this is possible via the Salesforce Reports and Dashboards REST API.  I’m just getting my head around this particular Salesforce API, and at first glance it looks like there is a better way to do this than with CSV data.  I’m also looking into TreasureData as an option, since it appears to support pulling data from Salesforce and pushing it into Elasticsearch.  As that work progresses I’ll be sure to share whatever shakes out of it!


Open Source is the Surest and Shortest Path to Digital Transformation

Back in 2013, Mike Olson, a co-founder of Cloudera, famously stated that “No dominant platform-level software infrastructure has emerged in the last 10 years in closed-source, proprietary form.”.  He’s absolutely right about that.  John Newton underscored this theme at a recent Alfresco Day event in NYC.  He shared this slide as a part of his presentation, which I think does a great job showing how much of our modern platforms are dependent on the open source ecosystem:


Platforms are more open today than they have ever been, with a few exceptions (I’m glaring annoyed at my iPhone as I write this).  Quite a few companies seem to have figured out the secret sauce of blending open platforms with proprietary value-adds to create robust, open ecosystems AND be able to pay the bills in the process.  This is very good news for you if you are pursuing a digital transformation strategy.

Why open source and open standards?

The advantages of open source are pretty well established at this point.  Open projects are more developer friendly.  They innovate faster.  They can fork and merge and rapidly change direction if the community needs that to happen (although there are good and bad forks).  Open has become the de-facto way that the digital business works today.  I’d challenge you to find any team within your organization that isn’t using at least one open source project or library.  Open has won.  That’s the first big advantage of open source in digital transformation today:  It’s ubiquitous.  You can find a platform or component to fill just about any need you have.

Open is also faster to try, and removes a lot of friction when testing out a new idea.  Effective digital transformation relies on speed and agility.  It’s a huge advantage to simply pull down a build of an open source technology you want to try out, stand it up and get to work.  That allows bad ideas to fail fast, and good ideas to flourish immediately.  Since testing ideas out is effectively free in terms of dollar cost, and cheap in terms of time and cognitive investment, I think we also tend to be more willing to throw it out and start over if we don’t get the results we want.  That’s a good thing as it ultimately leads to less time spent trying to find a bigger hammer to slam that square peg into a round hole.  If you decide to go forward with a particular technology, You’ll find commercial organizations standing behind them with support and value added components that can take an open source core to the next level.

If digital transformation relies on speed of innovation, then open technologies look even more appealing.  Why do open source projects tend to out-innovate their purely proprietary competitors?  There are probably a lot of reasons.  An open project isn’t limited to contributors from only one company.  Great ideas can come from anywhere and often do.  At their best, large open source projects function as meritocracies.  This is especially true of foundational platform technologies that may have originated at or get contributions from tech leaders.  These are the same technologies that can power your own digital transformation.

Open source projects also make the pace of innovation easier to manage since you get full transparency of what has changed version to version, and the visibility into the future direction of the project.  Looking at pending pull requests or commits between releases gives you a view into what is evolving in the project so that you can plan accordingly.  In a very real sense, pursuing a digital transformation strategy using open technologies forces you to adopt a modular, swappable, services driven approach.  Replacing a monolithic application stack every cycle is not possible, but replacing or upgrading individual service components in a larger architecture is, and open source makes that easier.

Software eats the world, and is a voracious cannibal

There is a downside to this pace of change, however.  Because open source projects innovate so quickly, and because the bar to creating one is so low, we often see exciting projects disrupted before they can really deliver on their promise.  Just when the people responsible for architecture and strategy start to fully understand how to exploit a new technology, the hype cycle begins on something that will supersede it.  Nowhere is this as bad as it is in the world of JavaScript frameworks where every week something new  and shiny and loud is vying for developers’ limited time and attention.  Big data suffers from the same problem.  A few years ago I would have lumped NoSQL (I hate that term, but it has stuck) databases into that category as well, but the sector seems to have settled down a little bit.

There is also a risk that an open source technology will lose its way, lose its user and developer base and become abandonware.  Look for those projects that have staying power.  Broad user adoption, frequent commits and active discussions are all good signs.  Projects that are part of a well established organization like the Apache Software Foundation are also usually a good bet.  Apache has a rigorous process that a project must follow to become a full blown project, and this drives a level of discipline that independent projects may lack.  A healthy company standing behind the project is another good sign, as that means there are people out there with financial interest in the project’s success.

Simply using open source projects to build your platform for transformation is no guarantee of success, but I would argue that carefully selecting the right open components does tilt the odds in your favor.

You Cannot Succeed at Digital Transformation Without Planning for Scale

TL;DR:  Digital transformation == scale, just by its nature.

Digital transformation affects all areas of a business, from the way leadership thinks of opportunities to the way developers build applications, and it carries challenges throughout that chain.  One of the biggest challenges for IT will come from achieving scale, often in unexpected places.

Why does digital transformation automatically mean scale?

Looking back at my last post on the journey to digital transformation, there are a few points where it should be obvious that you should be prepared to scale up.  In the digitization phase, for example, it makes sense to plan for managing a large amount of content and metadata.  Whether you are migrating from a legacy system, consolidating multiple repositories or ingesting a bunch of paper, your target repository will need to be ready to handle not only what you are bringing in today but what you plan to create and manage for the next several years.  Deploying in the cloud eases this burden significantly, freeing you from having to provision a ton of storage or DB capacity that will sit idle until it is used or deal with adding capacity to an on-premises solution.

Digitalization also drives the need to scale up.  Processes that were once done completely manually now get done via software.  Along the way a ton of useful information is captured.  Not only that information that is required to complete the process, such as attached documents, form data, assigned user, etc, but also metadata about the process itself.  How long did it take for a specific step to be completed?  Was it reassigned?  To whom and how often?  What is the current active task?  How many instances of each process are in flight?  All of this data is collected in a process management system.  The more processes move from a manual process to automated or managed processes, the bigger this pool becomes.  The net effect is an explosion in the amount of data that needs to be handled.

If digitization of content and digitalization of process lead to the need to scale, achieving digital transformation takes the problem and dials it up to 11.  Digital transformation will flip things around and turn more people that were previously consumers of information into producers, whether they realize it or not.  An employee working with a digital process may see some similarities in the types of information they are working with as they did before transformation, but behind the scenes there is a lot more data being created.

Alfresco’s platform is built for this kind of scale in both content and process.  It is built on proven, scalable and performant open source technologies, and has been deployed by thousands of customers around the globe in support of large, business critical applications.  Alfresco provides guidance in several areas to help you size your deployment, build in the cloud, and make smart decisions about how and when to scale.

What about those unexpected areas of scale?

Scaling your content and process platform as a part of a digital transformation strategy is expected from day one, and should be part of the roll out and maintenance plan built before the first application goes live.  It may start with scaling content and process technology, but it does not end there.  Let’s look at some common drivers of digital transformation.  A few days spent reading a lot of articles, literature and opinion on digital transformation yields a wealth of reasons why companies might pursue it:

  • Improve the customer experience and become more customer centric
  • Get leaner, meaner and more efficient
  • Make better business decisions
  • Responding to an increased pace of technological change
  • New competitive threats or market opportunities
  • Demand for real time information and insights

Take a look at that list.  Achieving any of these things will require more information to be captured and analyzed.  Getting more customer centric means understanding what your customers need, where they are dissatisfied with the current experiences you offer them and what you can do to improve them.  Becoming leaner and more efficient requires detailed metrics about processes so waste or delays can be identified and trimmed.  Better business decisions and real time information mean drinking from a firehose of data from across the business.

Achieving digital transformation requires you to plan for scale and not where you might expect.  It doesn’t just require you to plan for more content and more processes, but also for how to handle the data about those things that you will need to capture and analyze.

The non-technical side of scale

Ultimately digital transformation is not a technology problem, it’s a business problem.  It is unsurprising then, that we’ll be faced with challenges that we didn’t expect as we scale up that have nothing to do with technology.  Take, for example, support.  If your digital transformation rests on open APIs provided by a stack of homegrown, cloud and vendor provided services, how do you route support tickets?  If a user reports a problem, how can you narrow down the source and make sure that it gets handled by the right team?  A support team can be quickly overwhelmed if they need to sift through a dozen irrelevant error reports to find the ones that they can actually address.  The more services you rely on, the harder this problem becomes.  This is where detailed monitoring of the service layer becomes important.  Guess what that creates?  More data.  If you are using Alfresco technologies for content, process and governance, you have several options for keeping tabs on your services.

There are other non-technical areas that will be affected by scale as well.  Documentation and discovery, for example.  As the number of services rolled out in support of transformation increases, developers and business users alike need easy ways to find these services, and to understand how they work.  This in itself becomes another service.  Change management is another area that a business needs to be prepared to scale up.  Digital transformation increases the pace of change in an organization by enabling more rapid response to changing business conditions or new opportunities.  Without a solid framework in place to evaluate, decide on, execute and communicate change, digital transformation will have a hard time getting the traction it needs.

If you take away one thing from this and other articles about digital transformation, it’s this:  Achieving transformation requires scale from your systems, processes and people and not just those that deal directly with technology.  Don’t underestimate it and find yourself with a plan that cracks under the weight of change.

Digital Transformation and the Role of Content and Process

I recently had the opportunity to go to NYC and speak on architecting systems for digital transformation.  It was a terrific day with our customers and prospects, as well as the extended Alfresco crew and many of our outstanding partners.  This was the first time I’ve delivered this particular talk so I probably stumbled over my words a few times as I was watching the audience reaction as much as the speaker notes.  One of the things I always note when speaking is which slides prompt people to whip out their phones and take a picture.  That’s an obvious signal that you are hitting on something that matters to them in a way that caught their attention, so they want to capture it.  In my last talk, one of the slides that got the most attention and set the cameras snapping pictures was this one:



Digitization of content is the whole reason ECM as a discipline exists in the first place.  In the early days of ECM a lot of use cases centered around scanning paper as images and attaching metadata so it could be easily found later.  The drivers for this are obvious.  Paper is expensive, it takes up space (also expensive), it is hard to search, deliver and copy and you can’t analyze it en masse to extract insights.  As ECM matured, we started handling more advanced digital content such as PDFs and Office documents, videos, audio and other data, and we started to manipulate them in new ways.  Transforming between formats, for example, or using form fields and field extraction.  OCR also played a role, taking those old document image files and breathing into them a new life as first class digital citizens.  We layered search and analytics on top of this digital content to make sense of it all and find what we needed as the size of repositories grew ever larger.  Digitization accelerated with the adoption of the web.  That paper form that your business relied on was replaced by a web form or app, infinitely malleable.  That legacy physical media library was transformed into a searchable, streamable platform.

What all of these things have in common is that they are centered around content.


Simply digitizing content confers huge advantages on an organization.  It drives down costs, makes information easier to find and allows aggregation and analysis in ways that legacy analog formats never could.  While all of those things are important, they only begin to unlock the potential of digital technologies.  Digitized content allows us to take the next step:  Digitalization.  When the cost of storing and moving content drops to near zero, processes can evolve free from many of the previous constraints.  BPM and process management systems can leverage the digitized content and allow us to build more efficient and friendlier ways to interact with our customers and colleagues.  We can surface the right content to the right people at the right time, as they do their job.  Just as importantly, content that falls out of a process can be captured with context, managed as a record if needed.  Now instead of just having digitized content, we have digitalized processes that were not possible in the past.  We can see, via our process platform, exactly what is happening across our business processes.  Processes become searchable and can be analyzed in depth.  Processes can have their state automatically affected by signals in the business, and can signal others.  Our business state is now represented in a digital form, just like our content.

If digitization is about content, digitalization is about process.

Digital Transformation

The union of digitized content and digital processes is a powerful combination, and that helps create the conditions for digital transformation.  How?

In my opinion (and many others) the single most important thing to happen to technology in recent memory is the rise of the open standards web API.  APIs have been around on the web for decades.  In fact, my first company was founded to build out solutions on top of the public XML based API provided by DHL to generate shipments and track packages.  That was all the way back in the very early 2000s, a lifetime ago by tech standards.  Even then though, people were thinking about how to expose parts of their business as an API.

One of the watershed moments in the story of the API happened way back in the early 2000s as well, at Amazon.  By this point everybody has read about Amazon’s “Big Mandate”.  If you haven’t read about it yet, go read it now.  I’ll wait.  Ok, ready?  Great.  So now we know that the seeds for Amazon’s dominance in the cloud were planted over a decade and a half ago by the realization that everything needs to be a service, that the collection of services that you use to run your business can effectively become a platform, and that platform (made accessible) changes the rules.  By treating every area of their business as part of the platform, all the way down to the infrastructure that ran them, Amazon changed the rules for everybody.

How does this tie into the first two paragraphs?  What about content and process?  I’m glad you asked.  Content and process management systems take two critical parts of a business (the content it generates and the processes that it executes) and surface them as an API.  Want to find a signed copy of an insurance policy document?  API.  Want to see how many customers are in your sign up pipeline and at what stage?  API.  Want to see all of the compliance related records from that last round of lab testing?  API.  Want to release an order to ship?  API.  Want to add completed training to somebody’s personnel record?  API.  You get the idea.  By applying best practices around content and process management systems, you can quickly expose large chunks of your business as a service, and begin to think of your company as a platform.

This is transformative.  When your business is a platform, you can do things you couldn’t do before.  You can build out new and better user experiences much more quickly by creating thin UI layers on top of your services.  Your internal teams or vendors that are responsible for software that provides the services can innovate independently (within the bounds of the API contract) which immediately makes them more agile.  You can selectively expose those services to your customer or partners, enabling them to innovate on top of the services you provide them.  You can apply monitoring and analytics at the service layer to see how the services are being used, and by whom (This is one, in fact, I would argue that you MUST do if you plan to orient yourselves toward services in any meaningful way, but that’s a separate article) which adds a new dimension to BI.  This is the promise of digital transformation.

There’s certainly more to come on this in the near future as my team continues our own journey toward digital transformation.  We are already well down the path and will share our insights and challenges as we go.


Do’s and Don’ts of Alfresco Development

Recently I had the pleasure of going up to NYC to speak with Alfresco customers and prospects about architecting Alfresco solutions and how to align that with efforts around digital transformation.  Part of that talk was a slide that discussed a few “do’s and don’ts” of designing Alfresco extensions.  Somebody suggested that I write down my thoughts on that slide and share it with the community.  So…  Here you go!


Stick with documented public APIs and extension points.

In early versions of Alfresco it wasn’t exactly clear what parts of the API were public (intended for use in extensions) or private (intended for use by Alfresco engineering).  This was mostly fixed in the 4.x versions of the product.  This was mostly a problem for people building things out on the Alfresco Java API.  Today, the Java API is fully annotated so it’s clear what is and is not public.  The full list is also available in the documentation.  Of course Alfresco’s server side Javascript API is public, as is the REST API (both Alfresco core and CMIS).  Alfresco Activiti has similar API sets.

Leverage the Alfresco SDK to build your deployment artifacts

During my time in the field I saw some customers and Alfresco developers roll their own toolchain for building Alfresco extensions using Ant, Gradle, Maven or in some cases a series of shell scripts.  This isn’t generally recommended these days, as the Alfresco SDK covers almost all of these cases.  Take advantage of all of the work Alfresco has done on the SDK and use it to build your AMPs / JARs and create your Alfresco / Share WAR file.  The SDK has some powerful features, and can create a complete skeleton project in a few commands using Alfresco archetypes.  Starting Alfresco with the extension included is just as simple.

Use module JARs where possible, AMPs where not

For most of Alfresco’s history, the proper way to build an extension was to package it as an AMP (Alfresco Module Package).  AMPs (mostly) act like overlays for the Alfresco WAR, putting components in the right places so that Alfresco knows where to find them.  Module JARs were first added in Alfresco 4.2 and have undergone significant improvements since introduction and are now referred to as “Simple Modules” in the documentation.  Generally, if your extension does not require a third party JAR, using a Simple Module is a much cleaner, easier way to package and deploy an extension to Alfresco.

Unit test!

This should go without saying, but it’s worth saying anyway.  In the long run unit testing will save time.  The Alfresco SDK has pretty good support for unit tests, and projects built using the All-In-One Archetype have an example unit test class included.

Be aware of your tree, depth and degree matter

At its core, the Alfresco repository is a big node tree.  Files and folders are represented by nodes, and the relationships between these things are modeled as parent-child or peer relationships between nodes.  The depth of this tree can affect application design and performance, as can the degree, or number of child nodes.  In short, it’s not usually a best practice to have a single folder that contains a huge number of children.  Not only can this make navigation difficult if using the Alfresco API, but it can also create performance troubles if an API call tries to list all of the children in a gigantic folder.  In the new REST APIs the results are automatically paged which mitigates this problem.

Use the Alfresco ServiceRegistry instead of injecting individual services

When using the Alfresco Java API and adding new Spring beans there are two ways to inject Alfresco services such as NodeService, PermissionService, etc.  You can inject the individual services that you need, or you can inject the ServiceRegistry that allows access to all services.  On the one hand, injecting individual services makes it easy to see from the bean definition exactly what services are used without going to the code.  On the other hand, if you need another service you need to explicitly inject it.  My preference is to simply inject the registry.  A second reason to inject the registry is that you’ll always get the right version of the service.  Alfresco has two versions of each service.  The first is the “raw” service, which has a naming convention that starts with a lower case letter.  An example is nodeService.  These services aren’t recommended for extensions.  Instead, Alfresco provides an AoP proxy that follows a naming convention that starts with an upper case letter, such as NodeService.  This is the service that is recommended and is the one that will be returned by the service registry.


Modify core Alfresco classes, web scripts, or Spring beans

Alfresco is an open source product.  This means that you can, in theory, pull down the source, modify it as much as you like and build your own version to deploy.  Don’t do that.  That’s effectively a fork, and at that point you won’t be able to upgrade.  Stick with documented extension points and don’t modify any of the core classes, Freemarker templates, Javascript, etc.  If you find you cannot accomplish a task using the defined extension points, contact Alfresco support or open a discussion on the community forum and get an enhancement request in so it can be addressed.  Speaking of extension points, if you want an great overview of what those are and when to use them, the documentation has you covered.

Directly access the Alfresco / Activiti database from other applications

Alfresco and Activiti both have an underlying database.  In general, it is not a best practice to access this database directly.  Doing so can exhaust connection pools on the DB side or cause unexpected locking behavior.  This typically manifests as performance problems but other types of issues are possible, especially if you are modifying the data.  If you need to update things in Alfresco or Activiti, do so through the APIs.

Perform operations that will create extremely large transactions

When Alfresco is building its index, it pulls transaction from the API to determine what has changed, and what needs to be indexed.  This works well in almost all cases.  The one case where it may become an issue is if an extension or import process has created a single transaction which contains an enormous change set.  The Alfresco Bulk Filesystem Import Tool breaks down imports into chunks, and each chunk is treated as a transaction.  Older Alfresco import / export functionality such as ACP did not do this, so importing a very large ACP file may create a large transaction.  When Alfresco attempts to index this transaction, it can take a long time to complete and in some cases it can time out.  This is something of an edge case, but if you are writing an extension that may update a large number of nodes, take care to break those updates down into smaller parts.

Code yourself into a corner with extensions that can’t upgrade

Any time you are building an extension for Alfresco or Activiti, pay close attention to how it will be upgraded.  Alfresco makes certain commitments to API stability, what will and will not change between versions.  Be aware of these things and design with that in mind.  If you stick with the public APIs in all of their forms, use the Alfresco SDK and package / deploy your extensions in a supported manner, most of the potential upgrade hurdles will already be dealt with.

Muck around with the exploded WAR file.  

This one should go without saying, but it is a bad practice to modify an exploded Java WAR file.  Depending on which application server you use and how it is configured, the changes you make in the exploded WAR may be overwritten by a WAR redeployment at the most inopportune time.  Instead of modifying the exploded WAR, create a proper extension and deploy that.  The SDK makes this quick and easy, and you’ll save yourself a lot of pain down the road.

This list is by no means comprehensive, just a few things that I’ve jotted down over many years of developing Alfresco extensions and helping my peers and customers manage their Alfresco platform.  Do you have other best practices?  Share them in the comments!