Making the Case for Open Source Hardware, Open Data, and Open AI

Screen Shot 2018-03-07 at 10.33.38 AM

This is a longish article.  TL;DR; If we don’t get a handle on open hardware, open data policies and open access to intelligence, we’ll be back to buttoned-up, closed ecosystems in no time.  It may already be too late.

Open Source has won in the software world.

Let that sink in for a minute.  Do you agree?  Disagree?  Is that statement too bold?  Look across the spectrum of tools that we use to build the software and services that run the world.  Now, look at how many of them are open.  From the operating system up to protocols and service infrastructure to libraries to the toolkits we use to build user experiences, you’ll find open source and open standards everywhere you look.  Open has become the de facto way we build important pieces of software.

All this open source love in the software sphere is great, and it is changing the world, but I can’t help but feel like we are just beginning this journey.  In the early days of technology, it was open by default because so much of the innovation was coming from universities and government projects which were (arguably) used to open collaboration models.  Even your electronics came with a schematic (basically the hardware source code) so they could be repaired.  Commercialization and disposable products led to much of the software that ran the world becoming closed, and hardware designs that were no longer user serviceable.  Closed was harder to reverse engineer and imitate, so it was an attractive way to distribute technology for a commercial product.

This transition was not without its pain when engineers and technicians that were used to being able to crack open a product and fix it discovered that they could no longer do so.  It is said that this pain is, in part, what led to the creation of the FSF and GNU project, the history of which needs a book, not a blog post.  During the dark days of the 1980s and 90s (when I started my career as a developer) closed products were how most work got done.  We got poorly documented APIs into a black box of logic, and it was a Bad Thing.

The Internet turned that tide back.  Suddenly two developers on opposite ends of the earth with an idea could collaborate.  They could communicate.  They could rally others to their cause.  We needed common platforms to build things on in this new world of connectivity.  We wanted to tinker, to experiment, and closed systems are directly opposed to those goals.  We got GNU/Linux.  We got Apache and all the projects that the foundation incubated.  Open was back in the mainstream.

Fast forward to 2018, and for building stuff on the web, open is the default choice.

While the Internet brought new life to open source, it also created a time of rapidly contracting, overlapping technology cycles.  Terminal or standalone apps were replaced with client-server architectures which rapidly gave way to the web, which in turn quickly shifted to mobile as the dominant way to interact.  The next disruption is already underway; the shift to IoT and ubiquitous computing, led by voice platforms like Alexa, Google Home, and others.  What does open source mean in this world?

In a world of ubiquitous computing, technology is everywhere and nowhere at the same time.  It isn’t something you need to consciously pick up and use, it is baked into the things you already touch every day.  Your car.  Your glasses.  Your home.  Your clothing and shoes.  Your watch.  Your kid’s toys.  Your stereo.  Your appliances.  Computing will respond to your voice, gestures, expressions, location, and data from all your other sensors as well as broader sets of data about the world.  Done right, it will be a seamless experience that brings you the information you need when and where you need it and will let you affect your world from wherever you happen to be.  Achieving this means a boatload of new hardware, it means unfathomable volumes of data being captured and made available across all the platforms you use, and it means intelligence that can make sense of it all.

While we may have open source dominance in many parts of the software world, that won’t be enough in the very near future.  Moving into a world of ubiquitous computing while maintaining open source relevance means we need open components up and down the stack.  This starts at the bottom, with hardware.  We have a plethora of options today, thanks to projects and platforms like the Arduino, the Raspberry Pi, BeagleBoard, Particle, and others. This is not intended to be an exhaustive list of open hardware platforms, just a few examples.  If I left out your favorite, please leave a comment! These aren’t all fully open source, as some depend on hardware that may be patent encumbered or proprietary.  They are open in other important ways though, from board designs to developer toolchains to the software that runs on them.  With these types of components and a good idea, a modestly funded group or dedicated individual can build and launch something meaningful.

Am I saying that in some kind of open hardware utopia that everybody will hack together their own smartwatch in their kitchen?  No, no I’m not.  What I am saying is that these open source hardware options drop the barrier to entry for Internet-connected hardware down to where it needs to be (as low as possible), just like the Apache HTTP server did for people serving up HTML in the early days of the web.  These new tinkerers will birth tomorrow’s products or find other ways to participate in ubiquitous computing through their experiences.  If the barriers to entry aren’t kept low, only larger companies will be able to play in this space, and that would be a Bad Thing.  Say hello to the 1980s all over again.  Hopefully the hair doesn’t come back too.

If open hardware is the foundation, what about the data that it generates?  The data and the actions driven by that data is where the value exists, after all.  What good is an Internet-connected shower head if it can’t text you when you are about to jump in and scald yourself, while also queueing up care instructions for first degree burns on your television and ordering a new bottle of aloe lotion from Amazon?  Again, there is a robust ecosystem of open source tools for collecting and managing the data so we have somewhere to start.  You can even run these open source tools on your cloud provider of choice, which again keeps the barrier to entry nice and low.  Say what you will about utility computing, but it sure makes it cheap and easy to try out a new idea.

This is all well and good for the things we might build ourselves, but those are not going to be the only things that exist in our world of ubiquitous computing.  We’ll have devices and services from many vendors and open projects running in our lives.  Given that the data has such value, how can we ensure that data is open as well?  It is (or should be) ours, after all.  We should get access to it as if it were a bank balance, and decide when and how we share it with others.  Open source can inform this discussion as well through the development and application of open data policies.  I envision a future where these policies are themselves code that runs across providers, and they can be forked and merged and improved by the community of people to whom they apply.  The policy then becomes both a mechanism for opening data up to a myriad of uses that we control and a form of open source code itself.  This could enable the emergence of new marketplaces, where we set prices for the various types of data that we generate and companies bid with something of value (services, cash, whatever) to access it.  This happens today, albeit with limited scope.  If you use Facebook, Gmail, Instagram, LinkedIn or any other freebie like these, you are already buying a service with your data in a siloed sort of way.  Your data is your currency and the product that these companies resell.  Their service is the price they pay you to use to use your data.

The final piece of the puzzle is intelligence.  That is, the tools to sift through all the data our lives generate, extract meaningful patterns and insights, and to act on the same.  Because so much of the AI world today is still straddling the academic and the practical, the open mentality has a strong foothold here much like it did when the Internet itself was emerging from research projects and becoming a commercial force.  Take a quick look around the software projects used in companies building their future on AI.  You’ll quickly find that many of the most important are open source.  That’s great and all, but without data and models the tools themselves are of little use.  Combining the open source intelligence tooling with an open data policy framework creates a future where open source matters.

The combination of programmatic open data policies and open intelligence is powerful.  Open data policies would make it possible for new competitors to create something amazing without needing to generate huge sets of data themselves, all they need are users excited enough about what they are building to agree to share the data that already exists.  Much like the market for open data, this could create a market for intelligence.  Instead of being tied to what our existing providers build and decide we need, we might opt to use intelligence services that are tailored to our lives.  Interested in health and wellness?  Use your data as currency to buy an intelligence service that pulls together all your data from across your other providers to suggest personalized ways to be healthier.  Music nut?  Maybe a different intelligence service that looks at parts of your data that correspond to your mood and puts together the perfect playlist.  Trying to save money?  How about an intelligence that analyzes all the waste in your life and suggests ways to improve it.  None of these things will reach their full potential without the ability to use data from across your ubiquitous computing experience.  Importantly, with open data policies, you are in control of how your data is used and you can shape that control collaboratively with others that want the same thing you do.

What happens if we don’t do this?  What happens if we continue to allow our data to be locked up in proprietary silos owned by what will rapidly become legacy providers?  If that trend continues we’ll be back where we started.  Closed ecosystems, black boxes of logic with an API, no ownership of our own data or our own future, and a bar set so high that new entrants into the market will need pockets deeper than most possess to get a foothold.  It is this last point that worries me the most, and where a continued commitment to open source will have the biggest impact.  As I pointed out earlier, I don’t expect that most people will have the time, skills or resources to build their own solutions end to end.  That’s not the point.  The point is keeping the barriers to entry as low as they can be so the next generation of innovations can be born in places other than the R&D labs of the world’s biggest companies.  The democratization of technology birthed the web as we know it and it would be a shame to lose that now.

What’s next?  How do we make this better future happen?  Fortunately, many of the pieces are already falling into place, and more are coming.  Groups like the Open Source Hardware Association (OSHWA) are defining what it means to be open source hardware.  The non-profit Open AI research company has backing from the industry and publishes public papers and open source tools in the intelligence space.  The European General Data Protection Regulation (GDPR) contains important language about right of access (article 15), right of erasure (article 17), and data portability (article 20) that put ownership of your data back where it belongs.  With you.  Open source projects around big data, IoT, intelligence and other key technologies continue to thrive, and with a choice of utility computing providers you can spin them up without much upfront investment.  If an open future matters to you (and it should!), seek out and support these organizations.  Find a way to participate in the open source ecosystem around the work you do.  Support legislation that gives you control over your data.

This article isn’t meant to be gloomy, I think the future of open source is brighter than ever.  Realizing that future means we need look carefully at ways to ensure things other than just the code are open.

Teaching Open Source Practices, Version 4.0″ by Libby Levi is licensed under CC BY 2.0

How to Get the Most out of a Consulting Engagement


In my previous life in corporate IT we engaged consultants on many different projects, so I thought I had a good handle on what it takes to make a consulting engagement work.  Then about six years ago I made the jump from the corporate IT world to working as a consultant and many of my assumptions were proven to be dead wrong.  Since that time I’ve delivered dozens of engagements as an individual contributor, and in the last year taken on leadership of Alfresco’s global consulting organization.  Now that I have seen the consulting world from several angles, I have come to realize that there is quite a bit that the companies that engage consultants can do to get the most out of their time.   A few do’s and don’ts:


  1. Make sure that your project team knows that the consultant is coming, and have them block off the time that the consultant will be engaged.  Consultants bill for their time, and every hour that they are idle is wasted money.
  2. Ensure that you are prepared for their arrival.  When you are planning the engagement, have a conversation about prerequisites.  What system access is needed?  Will on-premises resources be required?  Do you need to set up shell accounts?  AWS accounts?  A good consultant can give you a checklist of things to have ready when they start the engagement so that you can hit the ground running.
  3. Set up stretch goals for the engagement.  Good consultants do their best to make sure the time estimates for tasks are accurate.  Sometimes things progress faster than expected, and it pays to have a few stretch goals in mind if you find yourself with extra time.  Keep these off the statement of work though, and don’t add them to the engagement scope until it is absolutely certain that the original work is complete.  Speaking of scope…
  4. Be diligent about the scope when designing the engagement and stick with it once the engagement is underway.  Tangents can be productive, but they are often productive in the wrong areas.  You consultant has likely done their engagement prep based on the scope defined in the statement of work, and shifting that scope will distract from the core work.  If you don’t know what you don’t know, consider a shorter discovery engagement first and use the findings to build a more complete scope.
  5. Outline your documentation requirements and build in time to complete them.  Is the work mostly code related?  If so, the documentation might be in the form of comments and completed code created in conjunction with your dev team.  Is the work architectural or more strategic in nature?  A more formal engagement report and executive summary might be in order.  Either way, make sure you and your consultant are aligned on expectations here and leave time to get it done.
  6. Entertain the idea of remote work.  Eliminating travel can keep the cost of an engagement down.  In addition, most consultants build in travel time for their on-site work.  If you go remote you can use those hours to get stuff done instead.
  7. Train up in anticipation of the project.  If your project involves launching a new software product or process, for example, it can pay dividends to do some self-study, online or in-person training to get a handle on the basics before your consultant shows up.  If your team has some solid background knowledge on the products being used, your consultant and team can focus on the hard questions instead of spending time reviewing the core concepts.  A little training also provides valuable context for the things that your consultant will tell you, making the puzzle pieces fit together that much easier
  8. Have a long term support and implementation plan post-engagement.  How will you take what your consultant does and build on it?  How will you carry on with long term support of things they help you build?  How will you track your progress toward implementing the things they recommend?  How will you know if it is time to bring them back for a follow up?

Don’t :

  1. Set up a separate internal email address for your consultant.  They already have a perfectly good email address, and requiring them to check yet another inbox makes it more likely that something will get overlooked.  Calendar invites and other features work across pretty much all mail systems, so there is really no good reason to make them use an internal email (barring extreme security requirements, etc).  Requiring your consultant to use an internal email address also means that they will lose access to that mail history when they leave.  Without access to this critical historical information, they won’t be as effective should you ring them up with a question or want to re-engage.
  2. Require the consultant to use a computer provided by your company if it can be avoided.  Many consultants have a particular toolchain that they rely on to get their job done efficiently.  If you require them to use a company provided desktop or laptop, they will need to recreate this toolchain, which may take some time that could be better spent solving the problems they were brought in to solve.  In the cases where this cannot be avoided, get a list of software and other items that the consultant will need and make sure those items are set up ahead of their arrival.
  3. Hand off a list of tasks and then leave them to it.  Consultants are usually brought in because they have some kind of specialized knowledge or expertise that is lacking in an organization.  One of the most valuable things a consultant can do for you is knowledge transfer.  Having them work closely with your team ensures that when the consultant leaves some valuable new knowledge and skills have been left behind.
  4. Let a small issue block the engagement.  Servers down?  Now is a good time to talk strategy.  DBA out sick, can’t get a schema set up?  Time for a Q&A session with the dev team.  Part of managing an engagement is knowing when to punt and move on to other things to stay productive.
  5. Reschedule.  Consultants live and die by their utilization.  For short to medium term consultants, this means that we probably have a deep pipeline of bookings that are carefully choreographed to keep us busy .  If a client has to reschedule, it can throw the whole schedule into chaos.  If we are lucky we might be able to find a client that wants to switch weeks or one in the queue that is ready now.  If we aren’t lucky, we end up on the bench and then have to scramble to find an available slot for the client that needs to reschedule, often much later than the client wants.
  6. Expect your consultant to circumvent normal support and escalation procedures.  We can help you navigate support, and we know how to get things escalated to the right people, but we can’t just ring up engineering (in 99% of cases, anyway).  Same goes for the product roadmap.  We know it, we live it, but that stuff is planned months or years in advance.  What we can do is help you build things that are aligned with the future of the product that will let you adopt new features faster and minimize rework that might come with change.
  7. Expect your consultant to know everything off the top of their head.  We prepare for our engagements based on the defined scope, and questions outside that scope may take a little time to answer.  On the bright side, when you engage a good consultant you don’t just get that one person.  When we go into the field we have a lot of resources backing us up in the form of the other consultants in our practice, our extended professional network inside our company, support, internal knowledgebases, engineering tickets, and other tools.  Rest assured we will use everything at our disposal to get you the right answer, even if we can’t hand it over on the spot.

This list is by no means exhaustive, and I will probably add to it over time.  Consulting was one of the most fun jobs I’ve ever had, and today I’m humbled to get to lead such a skilled, dedicated global team.  It also helps that we have great clients!  Hopefully at least a few of these tips will help you to get the most value from your future consulting engagements.  To any of the other consultants out there, what are your best practices for customers that engage your services?

Making Chat Content Flow with Alfresco

chat to alfresco

Let’s start with an axiom:  In a modern business chat is, as much as email, where business gets done.  Every company I have worked with or for in the past decade has come to increasingly rely on chat to coordinate activities within and across teams.  This is great, as chat provides a convenient, asynchronous way to work together.  It fits nicely in that niche between a phone call and an email in terms of urgency of response and formality.  It is no surprise then, that the uptake of chat tools for business has been high.

I’m a big proponent of using chat for our teams, but using it has uncovered a few challenges.  One of them is managing the content that comes out of chats.  Just about every chat tool has a content sharing facility for sending docs back and forth.  That’s great, but what happens to that content once it is pasted into a chat?  Often that content becomes somewhat ephemeral, maybe viewed when it is dropped into the chat but then forgotten.  What happens when you have segments of a chat that are valuable and should be saved?  If you are using chat as a support channel, for example, that chat content may well form the foundation for knowledge that you want to capture and reuse.  If the chat is related to a deal that sales is working, you might want to capture it as part of your sales process so others know what is happening.

This sort of “knowledge leakage” in chat can be partially solved by the search functions in chat, but that is often limited.  Some tools can only go back a certain amount of time or a certain number of messages with the baked in search functions.  The quality of this search depends on which tool you are using and whether you are using a paid or free version of that tool.  Frustratingly, chat tools do not typically index the content of documents shared via chat.  This means you can probably search for the title of the document or the context in which it was shared, but not the content of the document itself.  Regardless of the quality of search, it only solves part of the problem.  Content in chat may be discoverable, but it isn’t easily shareable or captured in a form that is useful for attaching or including in other processes.  In short, chat content creates chaos.  How can we tame this and make chat a better channel for sharing, capturing, curating and finding real knowledge?

Within our Customer Success team at Alfresco we have done some small research projects into this problem, and have even solved it to a certain extent.  Our first crack at this came in the form of an application that captures certain chats from our teams and saves those chat logs into an Alfresco repository.  This is a great start, as it partially solves one of our problems.  Chat logs are no longer ephemeral, they are captured as documents and saved to Alfresco’s Content Services platform.  From there they are indexed, taggable and linkable, so we can easily share something that came up in the chat with others, in context.  This approach is great for capturing whole chats, but what about saving selected segments, or capturing documents that are attached to chats?

Solving both of these problems is straightforward with Alfresco’s content services platform, a good chat tool with a great API, and a little glue.  For this solution I have set out a few simple goals:

  1. The solution should automatically capture documents added to a chat, and save those documents to Alfresco.
  2. The solution should post a link to the saved document in the chat in which the document originated so that it is easy to find in the content repository.  This also ensures that captured chat logs will have a valid link to the content.  We could ask people to post a doc somewhere and then share a link in the chat, but why do that when we can make it frictionless?
  3. The solution should allow chats or segments of chats to be captured as a text document and saved to Alfresco.
  4. The solution should allow for searching for content without leaving the chat, with search results posted to the chat.

Looking at the goals above, a chat bot seems like a fairly obvious solution.  Chat bots can listen in on a chat channel and act automatically when certain things happen in the chat or can be called to action discretely as needed.  A simple chat bot that speaks Alfresco could meet all of the requirements.  Such a bot would be added to a chat channel and listen for documents being uploaded to the chat.  When that happens the bot can retrieve the document from the chat, upload it to an Alfresco repository, and then post the link back to the chat.  The bot would also need to listen for itself to be called upon to archive part of a chat, at which point it retrieves the specified part of the chat from the provider’s API, saves it to Alfresco and posts a link back to the chat.  Finally, the bot would need to listen for itself to be invoked to perform a search, fetch the search terms from the chat, execute a search in Alfresco and post the formatted results back to the chat channel.  This gives us the tools we need to make content and knowledge capture possible in chat without putting a bunch of extra work on the plate of the chat users.

I’ve put together a little proof of concept for this idea, and released it as an open source project on Github (side note, thank you Andreas Steffan for Dockerizing it!).  It’s implemented as a chatbot for Slack that uses the BotKit framework for the interaction bits.  It’s a simplistic implementation, but it gets the point across.  Upon startup the bot connects to your slack and can be added to a chat just like any other bot / user.  Once it is there, it will listen for its name and for file upload events, responding more or less as described above.  Check out the readme for more info.

I’d love to see this improve a bit and get better.  A few things that I’d like to do right away:

  1. Rework the bot to use the Alfresco Unified Javascript API instead of CMIS and direct REST API calls.
  2. Get smarter about the way content gets stored in Alfresco.  Right now it’s just a little logic that builds a folder structure by channel, this could be better.
  3. Improve the way metadata is handled for saved documents / chats so they are easier to find in the future.  Maybe a field that stores chat participant names?
  4. Some NLP smarts.  Perhaps run a chat excerpt through AWS Comprehend and tag the chat excerpt with the extracted topics?
  5. Workflow integration.  I’d love to see the ability to post a document to the chat and request a review which then triggers an Alfresco Process Services process.

If you want to work on this together, pull requests are always welcome!


How to Host an Awesome Whisk(e)y Tasting

There is more to life than technology.  When I’m not working with the best team of Alfresco technologists and managers on the planet, hacking around on IoT and AI projects or exploring what’s next in tech I like to geek out on my other hobby, whiskey!  There are few things as fun (in my opinion) as getting a bunch of friends, colleagues, or both together for a little camaraderie and education.

Over the last few years my wife and I have had the immeasurable pleasure of hosting the Whiskies of the World events at the Birmingham Botanical Gardens.  Part education, part whiskeyfest, this class covers the history, botany, production and characteristics of whiskeys spanning the globe from Scotland and Ireland to America, Canada and Japan.  The best part of the class is the people that show up to learn a few things and share a taste or two.  After the class we almost always have a few people that come up to ask about hosting a tasting themselves, and they bring up a lot of great questions.  So many, in fact, that it seems worthwhile to jot down a few thoughts and best practices from our most popular and successful events.

No event goes off well without a good plan, right?  A whiskey tasting is no different.  We usually start planning our events by answering a few questions:

Who is the audience?

If you are inviting a group of people that don’t know the difference between a rye, a bourbon and a scotch, then perhaps it is best to start with good representative examples of several styles.  For a tasting like this we like to select a good base expression from distilleries around the world, specifically something that is representative of its style and region.  If your group consists of more experienced palates, then it might be best to drill down into a single style and explore it in depth or even focus on a single distillery.  If you are planning a little whiskey tasting as a part of something bigger, then maybe you can tailor your tasting to align with the main reason people are there.

What are we going to taste?

The intended audience sets the theme, but from there you still have a lot of choices to make.  Is the intent to focus on things that are readily available and can be purchased easily after the tasting by the people that attended?  It can be pretty frustrating to try something you love only to find out it is rare, allocated or otherwise unavailable.  That said, there is a place for a tasting of some true gems and rarities with the right group!  There are many ways to set a theme for your tasting.  You can just pick a few things, but I find that a theme makes it more interesting.

  • Single distillery vertical.  Find a distillery and sample their core expressions in age order.  This is a great chance to see how age affects whiskey, as you work your way from younger to older products made from the same base spirit.
  • Cask finish range.  Find a single distillery that offers expressions aged in various types of cooperage, such as ex-bourbon casks, sherry butts, port pipes, etc.  This can be a challenge as some of these can be hard to find, but the end result is a lot of fun.
  • Tasting within a single country.  If you want to go deep into what sets one country’s whiskey industry apart from the rest, find whiskies that are all from the same place across a range of distilleries.  Setting up a tasting of American whiskies or a tasting of scotch from a range of regions within Scotland provides a chance to explore the breadth of a region.
  • Tasting across countries.  What makes Scottish, American, Canadian, Irish, Japanese or Indian products distinct?
  • Blends vs. Singles.  Some blenders are fairly transparent about what goes into their blends.  Find a blend and some of the components, and see if you can pick out the component characteristics in the married product.
  • Proof.  Many distilleries offer their products in an uncut, barrel proof or cask strength expression.  Pick a couple and serve them straight and at various levels of dilution to see how water affects the nose and palate.
  • Mash bill.  Whiskeys around the world can be made from just about any kind of grain, within the confines of the laws that may or may not govern style in the country of origin.  How does a grain whiskey stack up to a malt whiskey from Scotland?  Perhaps compare a bourbon with a rye with an American wheat whiskey.
  • Discontinued products.  Due to supply problems, many still extant distilleries are dropping age statements from their products or discontinuing parts of their range.  If you are lucky enough to find or have a discontinued bottling or two, try a side by side with its replacement and see if you can pick out a difference.  Bonus points if you do it blind!
  • Silent stills.  Distilleries come and go, just like any other business.  Tasting products from now silent distilleries might shed light on why they didn’t make it, or show you some amazing products that never should have been lost.
  • “Craft” distilleries.  Whiskey is on everybody’s radar right now, and that means new distilleries are popping up all the time.  While everybody knows the stalwarts of the whiskey world, some of these newer producers have some interesting products hitting the market.

How many things are we going to taste, and in what order?

In general, I think it is best to limit the number of products in a single tasting to five or six, at the most.  Past that, your palate is fatigued and it all starts to run together.  There is also a danger of over indulging.  I like to keep the tasters fairly small, 0.5 to 0.75 oz (15-25ml) of each product.  The tasting order also matters.  High proof whiskey or heavily peated products will overwhelm other things in your lineup.  Start with more delicate, lower proof samples and work your way up to the heavy hitters like barrel proof offerings or richer heavily sherried, cask strength or peated malts.

What else do we need?

First up, glassware or another way to serve it.  When we host a large public tasting we use small disposable shot glasses and pour everything ahead of time.  When we host at home I prefer something more purpose built like a Copita or the classic Glencairn glass.  If you don’t have those, a snifter will work.  There are dozens of styles of whiskey glasses out there, which one to use is very much a matter of personal preference.  Just be sure to rinse and dry it between pours so you don’t get cross contamination or unwanted dilution.

Second, make sure you provide water, for several reasons.  Its good to have a bit between tastes or pours.  Water can help “open up” a whiskey, so it is good to have on hand so you can add a few drops and see how it affects a sample.  Finally, some people aren’t used to drinking whiskey straight and a glass can help put down the burn.  If you plan to add a little to your tasters, having small spoons or droppers on hand is a good idea.

We also like to provide some snacks or palate cleansers for the space between different pours.  Ask a dozen different professional tasters what they use and you’ll probably get a dozen different answers.  I like unsalted (or lightly salted) pretzels, crackers, etc and a little carbonated water.  Some people use white bread, or another bland baked good.  Whatever works for you.

Finally, there can be other things to add to your setup to make the tasting more interesting.  For our Whiskies of the World class we use a world map to point out the origin of each pour (roughly).  We use these as placemats and place each sample at the corresponding number of the map.

Screen Shot 2018-01-11 at 2.49.31 PM

Another idea is to provide some small tasting notebooks for making notes on what your attendees think of each pour, or perhaps to write down a ranked order of what they liked best for comparison at the end.  If you really want to get fancy you can put together a sheet that describes what you are tasting and why as a handout, perhaps including some history of each product or region.  For our public guided tastings we use a powerpoint presentation, but that might be a bit much for something at home.  No matter how you do it, it does help to convey some information about what you are trying.  It’s a learning experience.

Now that you have a plan, you know who is coming, what you are serving and have your setup ready, you can host an awesome tasting!  The most important thing to remember is that good whiskey is best when shared with good people, so have fun with it.


Alfresco Javascript API and AWS Lambda Functions. Part 1, Lambda and Cloud9 101


I’ve written before about several ways to make use of AWS Lambda functions within your Alfresco Content and Process Services deployment.  In preparation for my DevCon talk, I’m diving deeper into this and building out some demos for the session.  I figured this is a good time to do a quick writeup on one way to approach the problem.

What is a Lambda function?

Briefly, AWS Lambda is a “serverless” compute service.  Instead of the old model of provisioning an EC2 instance, loading software and services, etc, Lambda allows you to simply write code against a specific runtime (Java, Node, Python, or .NET) that is executed when an event occurs.  This event can come from a HUGE number of sources within the AWS suite of services.  When your function is called, it is spun up in a lightweight, configurable container and run.  That container may or may not be used again, depending on several factors.  AWS provides some information about what happens under the hood, but the idea is that most of the time you don’t need to sweat it.  In summary, a Lambda function is just a bit of code that runs in response to a triggering event.

Preparing the Lambda package

Creating a Lambda function through the AWS UI is trivial.  A few clicks, a couple form fields, and you’re done.  This is fine for simple functions, but what about when you need to use an external library?  The bad news is that this takes a couple extra steps.  The good news is that once you have done it, you can move on to a more productive workflow.  The sticking point from doing it all through the AWS console is the addition of libraries and dependencies.  We can get around that by using a Zip package to start out project.  The Zip package format is pretty simple.  Let’s create one, we’ll call it AlfrescoAPICall.

Start by creating an empty director for your project, and changing into that directory:

mkdir AlfrescoAPICall

cd AlfrescoAPICall

Next, create a handler for your Lambda function.  The default name for the handler is index.js, but you can change it so long as you configure your Lambda function appropriately.

touch index.js

Now, use npm to install the modules you need into your project directory.  For this example, we’ll use alfresco-js-api.

npm install alfresco-js-api

A real project probably wouldn’t just install the needed modules piecemeal, it makes more sense to define all of the dependencies in package.json instead.  Regardless, at the end of this you should have a project root folder that contains your Lambda function handler, and a node_modules directory that contains all of your dependencies.  Next, we need to Zip this up into a Lambda deployment package.  Don’t Zip up the project folder, we need to Zip up the folder contents instead.

zip -r .

And that’s it! is the Lambda package that we’ll upload to AWS so we can get to work.  We don’t need to do this again unless the dependencies or versions change.

Getting it into AWS

There are a few ways to get our newly created deployment package up to AWS.  It can be done using the AWS CLI, or with the AWS console.  If you do it via the CLI, it will look something like this:

aws lambda update-function-code –function-name AlfrescoAPICall –zip-file

If you do it via the AWS console, you can simply choose the file to upload :

Screen Shot 2017-12-14 at 4.53.50 PM

Regardless of how you update your Lambda function, once you have uploaded your zip archive you can see the entire project in the “edit code inline” view in the Lambda page:

Screen Shot 2017-12-14 at 4.55.17 PM

Woo hoo!  Now we have a skeleton Lambda project that includes the Alfresco Javascript Client API and we can start building!

Starting development in earnest

This process is simple, but it isn’t exactly the easiest way to build in AWS.  With the relaunch of Cloud9 at re:Invent, AWS has a pretty good web based IDE that we can use for this project.  I’m not going to go through all the steps of creating a Cloud9 environment, but once you have it created you should see your newly created Lambda function in the right-hand pane under the AWS Resources tab.  If you don’t, make sure the IAM account you are using with Cloud9 (not the root account!!) has access to your function.  It will be listed under Remote Functions.  Here’s where it gets really cool.  Right click on the remote function, and you can import the whole thing right into your development environment:

Screen Shot 2017-12-14 at 5.07.00 PM


Neat, right?  After you import it you’ll see it show up in the project explorer view on the left.  From here it is basically like any other IDE.  Tabbed code editor, tree views, syntax highlighting and completion, etc, etc.

Screen Shot 2017-12-14 at 5.07.17 PM

One cool feature of Cloud9 is the ability to test run your Lambda function locally (in the EC2 instance Cloud9 is connected to) or on the AWS Lambda service itself by picking the option from the menu in the run panel. As one would expect, you can also set breakpoints in your Lambda function for debugging:

Screen Shot 2017-12-14 at 5.19.51 PM

Finally, once you are done with your edits and have tested your function to your satisfaction, getting the changes back up to the Lambda service is trivial.  Save your changes, right click on the local function, and select deploy:

Screen Shot 2017-12-14 at 8.35.46 PM

Simple, right?  Now we have a working Lambda function with the Alfresco Javascript Client API available, and we can start developing something useful!  In part two, we’ll continue by getting Alfresco Process Services up and running in AWS and extend this newly created function to interact with a process.


Alfresco DevCon is Back in 2018

DevCon2018_Logo_Grey (1)

Alfresco DevCon has always been my favorite of the many events that we host around the world.  Unfortunately it has been on a bit of a hiatus lately as we explored other ways to connect with the many personas that we work with every day.  Good news though, it’s back in 2018!  Alfresco brings to market an awesome platform for accelerating digital business, and you can’t be a platform company without giving your developers a big friendly hug (and tools, and best practices, and a whole host of other things).  This is THE best opportunity to come and interact with all of the extended Alfresco developer family.  I’ve had a sneak peak at the presenter list and it’s an incredibly diverse group pulled from many of our stellar customers, partners, community members, and of course, a healthy dose of Alfrescans from across the technical parts of the company.

In what turned out to be a happy coincidence, I received the notice that my talks were accepted on my birthday.  Talk about a great present!  I’ve signed up to do two talks.  The first is a lightning talk on using natural language processing techniques to improve the quality of enterprise search.  The lightning talk got a shot in the arm with the recent announcement of AWS Comprehend.  Originally this talk was just going to talk about some on-premises offerings as well as Google Cloud NLP.  I’m excited to play around with the new AWS service and see how it stacks up.

The second talk I’m going to do in Portugal is a full length presentation on using Alfresco Process Services with AWS IoT,

Screen Shot 2017-12-13 at 12.34.22 PM

Originally the IoT talk was going to use a custom device built, probably a Raspberry Pi or an Arduino based device.  However, when one of these little guys showed up in the mail I decided to change it up a bit and use Amazon hardware.


If these topics sound familiar they should because I’ve blogged about both of them in the past.  Alfresco DevCon provides a great opportunity to build on and refine those ideas and vet them with a highly technical audience.  Hopefully the talks will spark some interesting conversations afterward.

I can’t overstate how happy I am that DevCon is back, and I can’t wait to see all of you in Lisbon in January!



Custom Angel’s Envy Bourbon Preamp Build

Bourbon whiskey has been surging in popularity in recent years, resulting in shortages of some popular brands, long lines, and camping for special releases.  Crazy.  Along with the newfound popularity of bourbon has come new bottlers and distilleries experimenting with cask finishes in much the same way the Scots have been doing for ages.  One notable brand doing that sort of experiment is Angel’s Envy.  Their flagship product is a port barrel finished bourbon, offered both as a regular bottling and a once a year cask strength release.  Several years ago a friend got me a bottle of the cask strength version which was thoroughly enjoyed.  With the whiskey gone, what to do with the awesome rough cut wooden box it came in?  Seems a shame to just throw it away.  Let’s upcycle it into something cool instead!

I’m a bit of a hobbyist maker, and have done a lot of woodworking and electronics projects over the years.  Languishing in my parts bin were most of the components I bought for a tube preamp project I started but never finished for lack of a good enclosure.  I bought the transformer and power supply from eBay as a kit, and sourced the tubes, tube shields, sockets, connectors, and passive components from a number of other places.  A little test fitting confirmed that there was room in the Angel’s Envy box to house all of the bits and pieces if I mounted the tubes and transformer on the outside.

Cooling came up as a possible problem, but with the tubes on the outside and proper ventilation to get some airflow going around the power supply it should be OK.  I’ll stick a digital thermometer in there the first time it runs just to be sure.  If it is too hot, I can always route out the bottom of the case and install a quiet little fan.  A set of conical feet turned from cocobolo wood raise the box up to allow that airflow configuration to work.  Given that the enclosure is wood it can’t be used for grounding like a metal enclosure, but that’s easy enough to solve.  Another problem might be interference since a wood case does not provide any shielding.  A layer of adhesive metal foil applied to the inside should take care of that.

With parts and design in hand, I headed down to Red Mountain Makers to lay out all of the components and get the holes drilled.  One downside to condo life is that we don’t have room for a drill press and the holes needed to be more precise than I can do with a hand drill.  Thankfully our local makerspace has everything I needed and more.  As an aside, if you have a local makerspace, find it, join it and use it.  Not only will you get access to the tools you need, but you’ll find friendly people with boatloads of expertise too.  Many of them are entirely volunteer run and they depend on regular memberships to stay up and running.  After a quick trip I got the transformer, tube sockets, shields and tubes all test fit and mounted.  It will have to come apart later as everything gets wired up, but better to try the fit first before I get to soldering it all together.

There are a few parts still on order, such as a fancy chrome cover for the transformer, some internal bits and pieces and such, but it is coming along nicely.  I’ll post an update with more pictures once it is all completed, but I’m too excited about the progress to not share a bit now!

What My Inbox Says About the State of Content Management

Screen Shot 2017-09-15 at 5.44.20 PM

I hit a major milestone today.  10000 unread messages in my inbox.  Actually, 10001 since one more came in between the time I took that screenshot and the time I started writing this article.  People tend to notice big round numbers, so when I logged in and saw that 10k sitting there I had a moment of crisis.  Why did it get that bad?  How am I ever going to clean up that pile of junk?  Am I just a disorganized mess?  Should somebody that lets their inbox get into that state ever be trusted with anything again?  It felt like failure.

Is it failure, or does this indicate that the shift from categorization to search as the dominant way to find things slowly become complete enough that I no longer really care (or more to the point, that I don’t need to care) how many items sit in the bucket?  I think it is the latter.

Think back to when you first started using email.  Maybe take a look at your corporate mail client, which likely lags the state of the art in terms of functionality (or in case you are saddled with Lotus Notes, far, FAR behind the state of the art).  Remember setting up mail rules to route stuff to the right folders, or regularly snipping off older messages at some date milestone and stuffing them into a rarely (never?) touched archive just in case?  Now think about the way you handle your personal mail, assuming that you are using something like Gmail or another modern incarnation.  Is that level of categorization and routing and archiving still necessary?  No?  Didn’t think so.  Email, being an area of fast moving improvement and early SaaS colonization, crossed this search threshold quite some time ago.  Systems that deal with more complex content in enterprise contexts took (and are still taking) a bit longer.  Bruce Schneier talks a bit about this toward the beginning of his book Data and Goliath where he states “for me, saving and searching became easier than sorting and deleting”.  By the by, Data and Goliath is a fantastic book, and I highly recommend you give it a read if you want to find yourself terrified by what is possible with a hoard of data.

So, what does this have to do with content management systems?  A lot, actually.

One of my guiding principles for implementing content management systems is to look for the natural structure of the content.  Are there common elements that suggest a structure that minimizes complexity?  Are there groupings of related content that need to stay together?  How are those things related?  Is there a rigid taxonomy at work or is it more ad-hoc?  Are there groups of metadata properties that cut across multiple types of content?  What constructs does your content platform support that align with the natural structure of the content itself?  From there you can start to flesh out the other concerns such as how other applications will access it and what type of things those applications expect to get back.  The takeaway here is to strike a balance between the intrinsic structure of what you have (if it even has any structure at all), and how people will use it.

I’ve written previously about Alfresco’s best practices, and one of the things that has always been considered to be part of that list is paying attention to the depth and degree of your graph.  Every node (a file, a folder, etc) in Alfresco has to have a parent (except for the root node), and it was considered a bad practice to simply drop a million objects as children to a single parent.  A better practice was to categorize these and create subcontainers so that no single object ended up with millions of children.  For some use cases this makes perfect sense, such as grouping documents related to an insurance claim in a claim folder, or HR documents in a folder for each person you employ, or grouping documents by geographical region, or per-project folders, etc.

Recently though, I have seen more use cases from customers where that model feels like artificially imposing a structure on content where no such structure exists.  Take, for example, policy documents.  These are likely to be consistent, perhaps singular documents with no other content associated with them.  They have a set of metadata used as common lookup fields like names, policy numbers, dates in force, etc.  Does this set of content require a hierarchical structure?  You could easily impose one by grouping policy docs by date range, or by the first character or two of the policy holder’s last name, but does that structure bring any value whatsoever that the metadata doesn’t?  Does adding that structure bring any value to the repository or the applications that use it? I don’t think it does.  In fact, creating structures that are misaligned to the content and the way it is used can create unnecessary complexity on the development side.  Even for the claim folder example above, it might make the most sense to just drop all claim folders under a common parent and avoid creating artificial structure where no intrinsic structure exists.  Similar to the inbox model, save and search can be more efficient.

Can you do this with Alfresco?  We have done some investigation and the answer appears to be “yes”, with some caveats.  We have several customers successfully using large collections of objects, and as long as they stay between some reasonable guardrails it works.  First, make sure that you aren’t creating a massive transaction when moving content under a single parent.  This is usually a concern during migrations of large collections.  One nice side-effect of breaking content down into smaller containers is that the same tools that do that usually help you to avoid creating massive single transactions that can manifest as indexing delays later.  Second, make sure you are accessing these large collections in a reasonable way.  If you request all the children of a parent and those children number in the millions, you’re going to have a bad time.  Use pagination to limit the number of results to something that you can reasonably handle.  You can do this easily with most of Alfresco’s APIs, including CMIS.  Even better, only retrieve those objects that you need by taking advantage of metadata search.  Finally, don’t try to browse those folders with extremely large numbers of children.  Share can load more than it needs when loading up folder content in the document library, which may cause a problem.  Really though, what value is there in trying to browse a collection that large?  Nobody is going to look past the first page or two of results anyway.

So there you (mostly) have it.  Listen to your content, listen to your users, listen to your developers, and don’t try to build structure where it doesn’t exist already.  Search is your friend.

Footnote:  When I posted the size of my unread inbox to Facebook, people had one of two reactions.  The first was “Heh, amateur, I have 30K unread in mine”.  The second was a reaction of abject horror that anybody has an inbox in that state.  Seems the “sort and delete” method still has its followers!

Spinning up the SmarterBham project at Code for Birmingham


There are few things that get my inner geek as excited as the intersection of technology and the public sphere.  We have only begun to scratch the surface of the ways that technology can improve governance, and the ways we can put the power to transform the places people live in the hands of those people themselves.  This sort of civic hacking has been promoted by groups like Code for America for some time.  Code for America is loosely organized into “brigades” that service a particular city.  These independent units operate all over the US, and have gone worldwide.  Like any town worth its salt, Birmingham has its own brigade.  I first became aware of it back in 2015, attended a few meetings and then it fell of my radar.  The group produced a lot of valuable work, including an app for spotting and reporting potholes, contributions to open data policies, traffic accident analysis.

For about a year now I’ve grown increasingly interested in building IoT devices for monitoring various aspects of city life.  My first project was an air quality monitor (which is still up and running!).  At the same time I got interested in The Things Network and other ways citizens can participate and own the rollout of IoT projects at scale.  The price of technology has dropped so far and connectivity has become so ubiquitous that it is entirely feasible for a group of dedicated people to roll out their own IoT solutions with minimal monetary investment.

When these two things collided, something started happening.  Some of the folks at Code for Birmingham got excited.  I got excited.  Community partners got excited.  We made a plan.  We designed some things.  We ordered parts.  We started coding.  We made a pitch deck (because of course you need a pitch deck).  We applied for grants.  We built a team.  A couple months down the road we’re making serious progress.  One of our team members has made huge strides in building a prototype.  Another has started on our AWS templates.  We’re getting there.

Take a look at what we’re building and if you want to be a part of something awesome, get in touch.  We need designers, coders, CAD gurus, testers, writers, data wizards, and of course, some dreamers.  All are welcome.


(Possibly) Enhancing Alfresco Search Part 2 – Google Cloud’s Natural Language API


In the first article in this series, we took a look at using Stanford’s CoreNLP library to enrich Alfresco Content Services metadata with some natural language processing tools.  In particular, we looked at using named entity extraction and sentiment analysis to add some value to enterprise search.  As soon as I posted that article, several people got in touch to see if I was working on testing out any other NLP tools.  In part 2 of this series, we’ll take a look at Google Cloud’s Natural Language API to see if it is any easier to integrate and scale, and do a brief comparison of the results.

One little thing I discovered during testing that may be of note if anybody picks up the Github code to try to do anything useful with it:  Alfresco and Google Cloud’s Natural Language API library can’t play nice together due to conflicting dependencies on some of the Google components.  In particular, Guava is a blocker.  Alfresco ships with and depends on an older version.  Complicating matters further, the Guava APIs changed between the version Alfresco ships and the version that the Google Cloud Natural Language API library requires so it isn’t as straightforward as grabbing the newer Guava library and swapping it out.  I have already had a quick chat with Alfresco Engineering and it looks like this is on the list to be solved soon.  In the meantime, I’m using Apache HttpClient to access the relevant services directly.  It’s not quite as nice as the idiomatic approach that the Google Cloud SDK takes, but it will do for now.

Metadata Enrichment and Extraction

The main purpose of these little experiments has been to assess how suitable each tool may be for using NLP to improve search.  This is where, I think, Google’s Natural Language product could really shine.  Google is, after all, a search company (and yes, more than that too).  Google’s entity analyzer not only plucks out all of the named entities, but it also returns a salience score for each.  The higher the score, the more important or central that entity is to the entire text.  The API also returns the number of proper noun mentions for that entity.  This seems to work quite well, and the salience score isn’t looking at just the number of mentions.  During my testing I found several instances where the most salient result was not that which was mentioned the most.  Sorting by salience and only making those most relevant mentions searchable metadata in Alfresco would be useful.  Say, for example, we are looking for documents about XYZ Corporation.  A simple keyword search would return every document that mentions that company, even if the document wasn’t actually about it.  Searching only those documents where XYZ Corporation is the most salient entity (even if not the most frequently mentioned) in the document would give us much more relevant results.

Sentiment analysis is another common feature in many natural language processing suites that may be useful in a context services search context.  For example, if you are using your content services platform to store customer survey results, transcripts of chats or other documents that capture an interaction you might want to find those that were strongly negative or positive to serve as training examples.  Another great use case exists in the process services world, where processes are likely to capture interactions in a more direct fashion.  Sentiment analysis is an area where Google’s and CoreNLP’s approaches differ significantly.  The Google Natural Language API provides two ways to handle sentiment analysis.  The first analyzes the overall sentiment of the provided text, the second provides sentiment analysis related to identified entities within the text.  These are fairly simplistic compared with the full sentiment graph that CoreNLP generates.  Google ranks sentiment along a scale of -1 to 1, with -1 being the most negative, and 1 the most positive.

Lower Level Features

At the core of any NLP tool are the basics of language parsing and processing such as tokenization, sentence splitting, part of speech tagging, lemmatization, dependency parsing, etc.  The Google Cloud NL API exposes all of these features through its syntax analysis API and the token object.  The object syntax is clear and easy to understand.  There are some important differences in the way these are implemented across CoreNLP and Google Cloud NL, which I may explore further in a future article.

Different Needs, Different Tools

Google Cloud’s Natural Language product differs from CoreNLP in some important ways.  The biggest is simply the fact that one is a cloud service and one is traditionally released software.  This has its pros and cons, of course.  If you roll your own NLP infrastructure with CoreNLP (whether you do it on-premises or in the cloud) you’ll certainly have more control but you’ll also be responsible for managing the thing.  For some use cases this might be the critical difference.  Best I can tell, Google doesn’t allow for custom models or annotators (yet).  If you need to train your own system or build custom stuff into the annotation pipeline, Google’s NLP offering may not work for you.  This is likely to be a shortcoming of many of the cloud based NLP services.

Another key difference is language support.  CoreNLP ships models for English, Arabic, Chinese, French, German and Spanish, but not all annotators work for all languages.  CoreNLP also has contributed models in other languages of varying completeness and quality.  Google Cloud’s NLP API has full fledged support for English, Japanese and Spanish, with beta support for Chinese (simplified and traditional), French, German, Italian, Korean and Portuguese.  Depending on where you are and what you need to analyze, language support alone may drive your choice.

On the feature front there are also some key differences when you compare “out of the box” CoreNLP with the Google Cloud NL API.  The first thing I tested was entity recognition.  I have been doing a little testing with a collection of short stories from American writers, and so far both seem to do a fair job of recognizing basic named entities like people, places, organizations, etc.  Google’s API goes further though and will recognize and tag things like the names of consumer goods, works of art, and events.  CoreNLP would take more work to do that sort of thing, it isn’t handled by the models that ship with the code.  On sentiment analysis, CoreNLP is much more comprehensive (at least in my admittedly limited evaluation).

Scalability and ergonomics are also concerns. If you plan to analyze a large amount of content there’s no getting around scale.  Without question, Google wins, but at a cost.  The Cloud Natural Language API uses a typical utilization cost model.  The more you analyze, the more you pay.  Ergonomics is another area where Google Cloud NL has a clear advantage.  CoreNLP is a more feature rich experience, and that shows in the model it returns.  Google Cloud NL API just returns a logically structured JSON object, making it much easier to read and interpret the results right away.  There’s also the issue of interface.  CoreNLP relies on a client library.  Google Cloud NL API is just a set of REST calls that follow the usual Google conventions and authentication schemes.  There has been some work to put a REST API on top of CoreNLP, but I have not tried that out.

The more I explore this space the more convinced I am that natural language processing has the potential to provide some significant improvements to enterprise content search, as well as to content and process analytics.