Making the Case for Open Source Hardware, Open Data, and Open AI

Screen Shot 2018-03-07 at 10.33.38 AM

This is a longish article.  TL;DR; If we don’t get a handle on open hardware, open data policies and open access to intelligence, we’ll be back to buttoned-up, closed ecosystems in no time.  It may already be too late.

Open Source has won in the software world.

Let that sink in for a minute.  Do you agree?  Disagree?  Is that statement too bold?  Look across the spectrum of tools that we use to build the software and services that run the world.  Now, look at how many of them are open.  From the operating system up to protocols and service infrastructure to libraries to the toolkits we use to build user experiences, you’ll find open source and open standards everywhere you look.  Open has become the de facto way we build important pieces of software.

All this open source love in the software sphere is great, and it is changing the world, but I can’t help but feel like we are just beginning this journey.  In the early days of technology, it was open by default because so much of the innovation was coming from universities and government projects which were (arguably) used to open collaboration models.  Even your electronics came with a schematic (basically the hardware source code) so they could be repaired.  Commercialization and disposable products led to much of the software that ran the world becoming closed, and hardware designs that were no longer user serviceable.  Closed was harder to reverse engineer and imitate, so it was an attractive way to distribute technology for a commercial product.

This transition was not without its pain when engineers and technicians that were used to being able to crack open a product and fix it discovered that they could no longer do so.  It is said that this pain is, in part, what led to the creation of the FSF and GNU project, the history of which needs a book, not a blog post.  During the dark days of the 1980s and 90s (when I started my career as a developer) closed products were how most work got done.  We got poorly documented APIs into a black box of logic, and it was a Bad Thing.

The Internet turned that tide back.  Suddenly two developers on opposite ends of the earth with an idea could collaborate.  They could communicate.  They could rally others to their cause.  We needed common platforms to build things on in this new world of connectivity.  We wanted to tinker, to experiment, and closed systems are directly opposed to those goals.  We got GNU/Linux.  We got Apache and all the projects that the foundation incubated.  Open was back in the mainstream.

Fast forward to 2018, and for building stuff on the web, open is the default choice.

While the Internet brought new life to open source, it also created a time of rapidly contracting, overlapping technology cycles.  Terminal or standalone apps were replaced with client-server architectures which rapidly gave way to the web, which in turn quickly shifted to mobile as the dominant way to interact.  The next disruption is already underway; the shift to IoT and ubiquitous computing, led by voice platforms like Alexa, Google Home, and others.  What does open source mean in this world?

In a world of ubiquitous computing, technology is everywhere and nowhere at the same time.  It isn’t something you need to consciously pick up and use, it is baked into the things you already touch every day.  Your car.  Your glasses.  Your home.  Your clothing and shoes.  Your watch.  Your kid’s toys.  Your stereo.  Your appliances.  Computing will respond to your voice, gestures, expressions, location, and data from all your other sensors as well as broader sets of data about the world.  Done right, it will be a seamless experience that brings you the information you need when and where you need it and will let you affect your world from wherever you happen to be.  Achieving this means a boatload of new hardware, it means unfathomable volumes of data being captured and made available across all the platforms you use, and it means intelligence that can make sense of it all.

While we may have open source dominance in many parts of the software world, that won’t be enough in the very near future.  Moving into a world of ubiquitous computing while maintaining open source relevance means we need open components up and down the stack.  This starts at the bottom, with hardware.  We have a plethora of options today, thanks to projects and platforms like the Arduino, the Raspberry Pi, BeagleBoard, Particle, and others. This is not intended to be an exhaustive list of open hardware platforms, just a few examples.  If I left out your favorite, please leave a comment! These aren’t all fully open source, as some depend on hardware that may be patent encumbered or proprietary.  They are open in other important ways though, from board designs to developer toolchains to the software that runs on them.  With these types of components and a good idea, a modestly funded group or dedicated individual can build and launch something meaningful.

Am I saying that in some kind of open hardware utopia that everybody will hack together their own smartwatch in their kitchen?  No, no I’m not.  What I am saying is that these open source hardware options drop the barrier to entry for Internet-connected hardware down to where it needs to be (as low as possible), just like the Apache HTTP server did for people serving up HTML in the early days of the web.  These new tinkerers will birth tomorrow’s products or find other ways to participate in ubiquitous computing through their experiences.  If the barriers to entry aren’t kept low, only larger companies will be able to play in this space, and that would be a Bad Thing.  Say hello to the 1980s all over again.  Hopefully the hair doesn’t come back too.

If open hardware is the foundation, what about the data that it generates?  The data and the actions driven by that data is where the value exists, after all.  What good is an Internet-connected shower head if it can’t text you when you are about to jump in and scald yourself, while also queueing up care instructions for first degree burns on your television and ordering a new bottle of aloe lotion from Amazon?  Again, there is a robust ecosystem of open source tools for collecting and managing the data so we have somewhere to start.  You can even run these open source tools on your cloud provider of choice, which again keeps the barrier to entry nice and low.  Say what you will about utility computing, but it sure makes it cheap and easy to try out a new idea.

This is all well and good for the things we might build ourselves, but those are not going to be the only things that exist in our world of ubiquitous computing.  We’ll have devices and services from many vendors and open projects running in our lives.  Given that the data has such value, how can we ensure that data is open as well?  It is (or should be) ours, after all.  We should get access to it as if it were a bank balance, and decide when and how we share it with others.  Open source can inform this discussion as well through the development and application of open data policies.  I envision a future where these policies are themselves code that runs across providers, and they can be forked and merged and improved by the community of people to whom they apply.  The policy then becomes both a mechanism for opening data up to a myriad of uses that we control and a form of open source code itself.  This could enable the emergence of new marketplaces, where we set prices for the various types of data that we generate and companies bid with something of value (services, cash, whatever) to access it.  This happens today, albeit with limited scope.  If you use Facebook, Gmail, Instagram, LinkedIn or any other freebie like these, you are already buying a service with your data in a siloed sort of way.  Your data is your currency and the product that these companies resell.  Their service is the price they pay you to use to use your data.

The final piece of the puzzle is intelligence.  That is, the tools to sift through all the data our lives generate, extract meaningful patterns and insights, and to act on the same.  Because so much of the AI world today is still straddling the academic and the practical, the open mentality has a strong foothold here much like it did when the Internet itself was emerging from research projects and becoming a commercial force.  Take a quick look around the software projects used in companies building their future on AI.  You’ll quickly find that many of the most important are open source.  That’s great and all, but without data and models the tools themselves are of little use.  Combining the open source intelligence tooling with an open data policy framework creates a future where open source matters.

The combination of programmatic open data policies and open intelligence is powerful.  Open data policies would make it possible for new competitors to create something amazing without needing to generate huge sets of data themselves, all they need are users excited enough about what they are building to agree to share the data that already exists.  Much like the market for open data, this could create a market for intelligence.  Instead of being tied to what our existing providers build and decide we need, we might opt to use intelligence services that are tailored to our lives.  Interested in health and wellness?  Use your data as currency to buy an intelligence service that pulls together all your data from across your other providers to suggest personalized ways to be healthier.  Music nut?  Maybe a different intelligence service that looks at parts of your data that correspond to your mood and puts together the perfect playlist.  Trying to save money?  How about an intelligence that analyzes all the waste in your life and suggests ways to improve it.  None of these things will reach their full potential without the ability to use data from across your ubiquitous computing experience.  Importantly, with open data policies, you are in control of how your data is used and you can shape that control collaboratively with others that want the same thing you do.

What happens if we don’t do this?  What happens if we continue to allow our data to be locked up in proprietary silos owned by what will rapidly become legacy providers?  If that trend continues we’ll be back where we started.  Closed ecosystems, black boxes of logic with an API, no ownership of our own data or our own future, and a bar set so high that new entrants into the market will need pockets deeper than most possess to get a foothold.  It is this last point that worries me the most, and where a continued commitment to open source will have the biggest impact.  As I pointed out earlier, I don’t expect that most people will have the time, skills or resources to build their own solutions end to end.  That’s not the point.  The point is keeping the barriers to entry as low as they can be so the next generation of innovations can be born in places other than the R&D labs of the world’s biggest companies.  The democratization of technology birthed the web as we know it and it would be a shame to lose that now.

What’s next?  How do we make this better future happen?  Fortunately, many of the pieces are already falling into place, and more are coming.  Groups like the Open Source Hardware Association (OSHWA) are defining what it means to be open source hardware.  The non-profit Open AI research company has backing from the industry and publishes public papers and open source tools in the intelligence space.  The European General Data Protection Regulation (GDPR) contains important language about right of access (article 15), right of erasure (article 17), and data portability (article 20) that put ownership of your data back where it belongs.  With you.  Open source projects around big data, IoT, intelligence and other key technologies continue to thrive, and with a choice of utility computing providers you can spin them up without much upfront investment.  If an open future matters to you (and it should!), seek out and support these organizations.  Find a way to participate in the open source ecosystem around the work you do.  Support legislation that gives you control over your data.

This article isn’t meant to be gloomy, I think the future of open source is brighter than ever.  Realizing that future means we need look carefully at ways to ensure things other than just the code are open.

Teaching Open Source Practices, Version 4.0″ by Libby Levi is licensed under CC BY 2.0

Making Chat Content Flow with Alfresco

chat to alfresco

Let’s start with an axiom:  In a modern business chat is, as much as email, where business gets done.  Every company I have worked with or for in the past decade has come to increasingly rely on chat to coordinate activities within and across teams.  This is great, as chat provides a convenient, asynchronous way to work together.  It fits nicely in that niche between a phone call and an email in terms of urgency of response and formality.  It is no surprise then, that the uptake of chat tools for business has been high.

I’m a big proponent of using chat for our teams, but using it has uncovered a few challenges.  One of them is managing the content that comes out of chats.  Just about every chat tool has a content sharing facility for sending docs back and forth.  That’s great, but what happens to that content once it is pasted into a chat?  Often that content becomes somewhat ephemeral, maybe viewed when it is dropped into the chat but then forgotten.  What happens when you have segments of a chat that are valuable and should be saved?  If you are using chat as a support channel, for example, that chat content may well form the foundation for knowledge that you want to capture and reuse.  If the chat is related to a deal that sales is working, you might want to capture it as part of your sales process so others know what is happening.

This sort of “knowledge leakage” in chat can be partially solved by the search functions in chat, but that is often limited.  Some tools can only go back a certain amount of time or a certain number of messages with the baked in search functions.  The quality of this search depends on which tool you are using and whether you are using a paid or free version of that tool.  Frustratingly, chat tools do not typically index the content of documents shared via chat.  This means you can probably search for the title of the document or the context in which it was shared, but not the content of the document itself.  Regardless of the quality of search, it only solves part of the problem.  Content in chat may be discoverable, but it isn’t easily shareable or captured in a form that is useful for attaching or including in other processes.  In short, chat content creates chaos.  How can we tame this and make chat a better channel for sharing, capturing, curating and finding real knowledge?

Within our Customer Success team at Alfresco we have done some small research projects into this problem, and have even solved it to a certain extent.  Our first crack at this came in the form of an application that captures certain chats from our teams and saves those chat logs into an Alfresco repository.  This is a great start, as it partially solves one of our problems.  Chat logs are no longer ephemeral, they are captured as documents and saved to Alfresco’s Content Services platform.  From there they are indexed, taggable and linkable, so we can easily share something that came up in the chat with others, in context.  This approach is great for capturing whole chats, but what about saving selected segments, or capturing documents that are attached to chats?

Solving both of these problems is straightforward with Alfresco’s content services platform, a good chat tool with a great API, and a little glue.  For this solution I have set out a few simple goals:

  1. The solution should automatically capture documents added to a chat, and save those documents to Alfresco.
  2. The solution should post a link to the saved document in the chat in which the document originated so that it is easy to find in the content repository.  This also ensures that captured chat logs will have a valid link to the content.  We could ask people to post a doc somewhere and then share a link in the chat, but why do that when we can make it frictionless?
  3. The solution should allow chats or segments of chats to be captured as a text document and saved to Alfresco.
  4. The solution should allow for searching for content without leaving the chat, with search results posted to the chat.

Looking at the goals above, a chat bot seems like a fairly obvious solution.  Chat bots can listen in on a chat channel and act automatically when certain things happen in the chat or can be called to action discretely as needed.  A simple chat bot that speaks Alfresco could meet all of the requirements.  Such a bot would be added to a chat channel and listen for documents being uploaded to the chat.  When that happens the bot can retrieve the document from the chat, upload it to an Alfresco repository, and then post the link back to the chat.  The bot would also need to listen for itself to be called upon to archive part of a chat, at which point it retrieves the specified part of the chat from the provider’s API, saves it to Alfresco and posts a link back to the chat.  Finally, the bot would need to listen for itself to be invoked to perform a search, fetch the search terms from the chat, execute a search in Alfresco and post the formatted results back to the chat channel.  This gives us the tools we need to make content and knowledge capture possible in chat without putting a bunch of extra work on the plate of the chat users.

I’ve put together a little proof of concept for this idea, and released it as an open source project on Github (side note, thank you Andreas Steffan for Dockerizing it!).  It’s implemented as a chatbot for Slack that uses the BotKit framework for the interaction bits.  It’s a simplistic implementation, but it gets the point across.  Upon startup the bot connects to your slack and can be added to a chat just like any other bot / user.  Once it is there, it will listen for its name and for file upload events, responding more or less as described above.  Check out the readme for more info.

I’d love to see this improve a bit and get better.  A few things that I’d like to do right away:

  1. Rework the bot to use the Alfresco Unified Javascript API instead of CMIS and direct REST API calls.
  2. Get smarter about the way content gets stored in Alfresco.  Right now it’s just a little logic that builds a folder structure by channel, this could be better.
  3. Improve the way metadata is handled for saved documents / chats so they are easier to find in the future.  Maybe a field that stores chat participant names?
  4. Some NLP smarts.  Perhaps run a chat excerpt through AWS Comprehend and tag the chat excerpt with the extracted topics?
  5. Workflow integration.  I’d love to see the ability to post a document to the chat and request a review which then triggers an Alfresco Process Services process.

If you want to work on this together, pull requests are always welcome!

 

Spinning up the SmarterBham project at Code for Birmingham

codeforbirmingham-logo

There are few things that get my inner geek as excited as the intersection of technology and the public sphere.  We have only begun to scratch the surface of the ways that technology can improve governance, and the ways we can put the power to transform the places people live in the hands of those people themselves.  This sort of civic hacking has been promoted by groups like Code for America for some time.  Code for America is loosely organized into “brigades” that service a particular city.  These independent units operate all over the US, and have gone worldwide.  Like any town worth its salt, Birmingham has its own brigade.  I first became aware of it back in 2015, attended a few meetings and then it fell of my radar.  The group produced a lot of valuable work, including an app for spotting and reporting potholes, contributions to open data policies, traffic accident analysis.

For about a year now I’ve grown increasingly interested in building IoT devices for monitoring various aspects of city life.  My first project was an air quality monitor (which is still up and running!).  At the same time I got interested in The Things Network and other ways citizens can participate and own the rollout of IoT projects at scale.  The price of technology has dropped so far and connectivity has become so ubiquitous that it is entirely feasible for a group of dedicated people to roll out their own IoT solutions with minimal monetary investment.

When these two things collided, something started happening.  Some of the folks at Code for Birmingham got excited.  I got excited.  Community partners got excited.  We made a plan.  We designed some things.  We ordered parts.  We started coding.  We made a pitch deck (because of course you need a pitch deck).  We applied for grants.  We built a team.  A couple months down the road we’re making serious progress.  One of our team members has made huge strides in building a prototype.  Another has started on our AWS templates.  We’re getting there.

Take a look at what we’re building and if you want to be a part of something awesome, get in touch.  We need designers, coders, CAD gurus, testers, writers, data wizards, and of course, some dreamers.  All are welcome.

 

(Possibly) Enhancing Alfresco Search with Stanford CoreNLP

corenlp + alfresco

Laurence Hart recently published an article on CMSWiRE about AI and enterprise search that I found interesting.  In it, he lays out some good arguments about why the expectations for AI and enterprise search are a bit overinflated.  This is probably a natural part of they hype cycle that AI is currently traversing.  While AI probably won’t revolutionize enterprise search overnight, it definitely has the potential to offer meaningful improvements in the short term.  One of the areas where I think we can get some easy improvements is by using natural language processing to extract things that might be relevant to search, along with some context around those things.  For example, it is handy to be able to search for documents that contain references to people, places, organizations or specific dates using something more than a simple keyword search.  It’s useful for your search to know the difference between the china you set on your dinner table and China the country, or Alfresco the company vs eating outside.  Expanding on this work, it might also be useful to do some sentiment analysis on a document, or extract specific parts of it for automatic classification.

Stanford offers a set of tools to help with common natural language processing (NLP) tasks.  The Stanford CoreNLP project consists of a framework and variety of annotators that handle tasks such as sentiment analysis, part of speech tagging, lemmatization, named entity extraction, etc.  My favorite thing about this particular project is how they have simply dropped the barriers to trying it out to zero.  If you want to give the project a spin and see how it would annotate some text with the base models, Stanford helpfully hosts a version you can test out.  I spent an afternoon throwing text at it, both bits I wrote, and bits that come from some of my test document pool.  At first glance it seems to do a pretty good job, even with nothing more than the base models loaded.

I’d like to prove out some of these concepts and explore them further, so I’ve started a simple project to connect Stanford CoreNLP with the Alfresco Content Services platform.  The initial goals are simple:  Take text from an document stored in Alfresco, run it through a few CoreNLP annotators, extract data from the generated annotations, and store that data as Alfresco metadata.  This will make annotation data such as named entities (dates, places, people, organizations) directly searchable via Alfresco Search Services.  I’m starting with an Alfresco Repository Action that calls CoreNLP since that will be easy to test on individual documents.  It would be pretty straightforward to take this component and run it as a metadata extractor, which might make more sense in the long run.  Like most of my Alfresco extension or integration projects, this roughly follows the Service Action Pattern.

Stanford CoreNLP makes the integration bits pretty easy.  You can run CoreNLP as a standalone server, and the project helpfully provides a Java client (StandfordCoreNLPClient) that somewhat closely mirrors the annotation pipeline so if you already know how to use CoreNLP locally, you can easily get it working from an Alfresco integration.  This will also help with scalability since CoreNLP can be memory hungry and running the NLP engine in a separate JVM or server from Alfresco definitely makes sense.  It also makes sense to be judicious about what annotators you run, so that should be configurable in Alfresco.  It also make sense to limit the size of the text that gets sent to CoreNLP, so long term some pagination will probably be necessary to break down large files into more manageable pieces.  The CoreNLP project itself provides some great guidance on getting the best performance out of the tool.

A couple of notes about using CoreNLP programmatically from other applications.  First, if you just provide a host name (like localhost) then CoreNLP assumes that you will be connecting via HTTPS.   This will cause the StanfordCoreNLPClient to not respond if your server isn’t set up for it.  Oddly, it also doesn’t seem to throw any kind of useful exception, it just sort of, well, stops.  If you don’t want to use HTTPS, make sure to specify the protocol in the host name.  Second, Stanford makes it pretty easy to use CoreNLP in your application by publishing on Maven Central, but the model jars aren’t there.  You’ll need to download those separately.  Third, CoreNLP can use a lot of memory for processing large amounts of text.  If you plan to do this kind of thing at any kind of scale, you’ll need to run the CoreNLP bits on a separate JVM, and possibly a separate server.  I can’t imagine that Alfresco under load and CoreNLP in the same JVM would yield good results.  Fourth, the client also has hefty memory requirements.  In my testing, running CoreNLP client in an Alfresco action with less than 2GB of memory caused out of memory errors when processing 5-6 pages of dense text.  Finally, the pipeline that you feed CoreNLP is ordered.  If you don’t have the correct annotators in there in the right order, you won’t get the results you expect.  Some annotators have dependencies, which aren’t always clear until you try to process some text and it fails.  Thankfully the error message will tell you what other annotators you need in the pipeline for it to work.

After some experimentation I’m not sure that CoreNLP is really well suited for integration with a content services platform.  I had hoped that most of the processing using StanfordCoreNLPClient to connect to a server would take place on the server, and only results would be returned but that doesn’t appear to be the case.  I still think that using NLP tools to enhance search has merit though.  If you want to play around with this idea yourself you can find my PoC code on Github.  It’s a toy at this point, but might help others understand Alfresco, some intricacies of CoreNLP, or both.  As a next step I’m going to look at OpenNLP and a few other tools to better understand both the concepts and the space.

 

A Simple Pattern for Alfresco Extensions

Over the years I have worked with and for Alfresco, I have written a ton of Alfresco extensions.  Some of these are for customers, some are for my own education, some for R&D spikes, etc.  I’d like to share a common pattern that comes in handy.  If you are a super experienced Alfresco developer, this article probably isn’t for you.  You know this stuff already!

There are a lot of ways to build Alfresco extensions, and a lot of ways to integrate your own code or connect Alfresco to another product.  There are also a lot of ways you might want to call your own code or an integration, whether that is from an Action, a Behavior, a Web Script, a scheduled job, or via the Alfresco Javascript API.  One way to make your extension as flexible as possible is to use what could informally be called the “Service Action Pattern”.

The Service Action Pattern

service_action_pattern_sequence (1)

Let’s start by describing the Service Action Pattern.  In this pattern, we take the functionality that we want to make available to Alfresco and we wrap it in a service object.  This is a well established pattern in the Alfresco world, used extensively in Alfresco’s own public API.  Things like the NodeService, ActionService, ContentService, etc all take core functionality found in the Alfresco platform and wrap it in a well defined service interface consisting of a set of public APIs that return Alfresco objects like NodeRefs, Actions, Paths, etc, or Java primitives.  The service object is where all of our custom logic lives, and it provides a well defined interface for other objects to use.  In many ways the service object serves as a sort of adapter pattern in that we are using the service object to translate back and forth between the types of domain specific objects that your extension requires and Alfresco objects.  When designing a new service in Alfresco, I find it is a best practice to limit the types of objects returned by the service layer to those things that Alfresco natively understands.  If your service object method creates a new node, return a NodeRef, for example.

A custom service object on its own isn’t terribly useful, since Alfresco doesn’t know what to do with it.  This is where an Alfresco Action comes in handy.  We can use one or more Alfresco Actions to call the services that our service object exposes.  Creating an action to call the service object has several advantages.  First, once you have an Action you can easily call that Action (and thus the underlying service object) from the Javascript API (more on this in a moment).  Second, it is easy to take an Action and surface it in Alfresco Share for testing or so your users can call it directly.  Actions can also be triggered by folder rules, which can be useful if you need to call some code when a document is created or updated.  Finally, Actions are registered with Alfresco, which makes them easy to find and call from other Java or server side Javascript code via the ActionService.  If you want to do something to a file or folder in Alfresco there is a pretty good chance that an Action is the right approach.

Using the Service Action Pattern also makes it simple to expose your service object as a REST API.  Remember that Alfresco Actions can be located and called easily from the Javascript API.  The Javascript API also happens to be (IMHO) the simplest way to build a new Alfresco Web Script.  If you need to call your Action from another system (a very common requirement) you can simply create a web script that exposes your action as a URL and call away.  This does require a bit of boilerplate code to grab request parameters and pass them to the Action, which in turn will call your service object.  It isn’t too much and there are lots of great examples in the Alfresco documentation and out in the community.

So why not just bake the code into the Action itself?  Good question!  First, any project of some complexity is likely to have a group of related functionality.  A good example can be found in the AWS Glacier Archive for Alfresco project we built a couple years ago at an Alfresco hack-a-thon.  This project required us to have Actions for archiving content, initiating a retrieval, and retrieving content.  All of these Actions are logically and functionally related, so it makes sense to group them together in a single service.  If you want the details of how Alfresco integrates with AWS Glacier, you just have to look at the service implementation class, the Action classes themselves are just sanity checks and wiring.  Another good reason to put your logic into a service class is for reuse outside of Actions.  Actions carry some overhead, and depending on how you plan to use it you may want to make your logic available directly to a behavior or expose it to the Alfresco Javascript API via a root scope object.  Both of these are straightforward if you have a well defined service object.

I hope this helps you build your next awesome Alfresco platform extension, I have found it a useful way to implement and organize my Alfresco projects.

Alfresco Premier Services – New Blog

Screen Shot 2017-03-27 at 9.14.38 PM

I’m not shy about saying the best thing about my job is my team.  Never in my career have I worked with such a dedicated, skilled and fun group of people.  Whether you are talking about our management team or our individual contributors, the breadth and depth of experience across the Alfresco Premier Services team is impressive.  Members of our team have developed deep expertise across our product line, from auditing to RM, from workflow to search.  Some folks on the team have even branched out and started their own open source projects around Alfresco.  We have decided to take the next step in sharing our knowledge and launch a team blog.

The inaugural post details some recent changes to the Alfresco Premier Services offerings that coincide with the digital business platform launch.  We will follow that up in short order with new articles covering both content and process services including guidance on FTP load balancing and pulling custom property files into a process services workflow.

Lots of exciting stuff to come from the Premier Services Team, stay tuned!

Open Source in an AI World. Open Matters More Now Than Ever.

121212_2_OpenSwissKnife

Technological unemployment is about to become a really big problem.  I don’t think the impact of automation on jobs is in any doubt at this point, the remaining questions are mostly around magnitude and timeline.  How many jobs will be affected, and how fast will it happen?  One of the things that worries me the most is the inevitable consolidation of wealth that will come from automation.  When you have workers building a product or providing a service, a portion of the wealth generated by those activities always flows to the people that do the work.  You have to pay your people, provide them benefits, time off, etc.  Automation changes the game, and the people that control the automation are able to keep a much higher percentage of the wealth generated by their business.

When people talk about technological unemployment, they often talk about robots assuming roles that humans used to do.  Robots to build cars, to build houses, to drive trucks, to plant and harvest crops, etc.  This part of the automation equation is huge, but it isn’t the only way that technology is going to make some jobs obsolete.  Just as large (if not larger) are the more ethereal ways that AI will take on larger and more complex jobs that don’t need a physical embodiment.  Both of these things will affect employment, but they differ in one fundamental way:  Barrier to entry.

High barriers

Building robots requires large capital investments for machining, parts, raw materials and other physical things.  Buying robots from a vendor frees you from the barriers of building, but you still need the capital to purchase them as well as an expensive physical facility in which you can deploy them.  They need ongoing physical maintenance, which means staff where the robots are (at least until robots can do maintenance on each other).  You need logistics and supply chain for getting raw materials into your plant and finished goods out.  This means that the financial barrier to entry for starting a business using robots is still quite high.  In many ways this isn’t so different from starting a physical business today.  If you want to start a restaurant you need a building with a kitchen, registers, raw materials, etc.  The difference is that you can make a one time up-front investment in automation in exchange for a lower ongoing cost in staff.  Physical robots are also not terribly elastic.  If you plan to build an automated physical business, you need to provision enough automation to handle your peak loads. This means idle capacity when you aren’t doing enough business to keep your machines busy.  You can’t just cut a machine’s hours and reduce operating costs in the same way you can with people.  There are strategies for dealing with this like there are in human-run facilities, but that’s beyond the scope of this article.

Low barriers

At the other end of the automation spectrum is AI without a physical embodiment.  I’ve been unable to find an agreed upon term for this concept of a “bodiless” AI.  Discorporate AI?  Nonmaterial AI?  The important point is that this category includes automation that isn’t a physical robot.  Whatever you want to call it, a significant amount of technological unemployment will come from this category of automation.  AI that is an expert in a given domain will be able to provide meaningful work delivered through existing channels like the web, mobile devices, voice assistants like Alexa or Google Home, IoT devices, etc.  While you still need somewhere for the AI to run, it can be run on commodity computing resources from any number of cloud providers or on your own hardware.  Because it is simply applied compute capacity, it is easier to scale up or down based on demand, helping to control costs during times of low usage.  Most AI relies on large data sets, which means storage, but storage costs continue to plummet to varying degrees depending on your performance, retrieval time, durability and other requirements.  In short, the barrier to entry for this type of automation is much lower.  It takes a factory and a huge team to build a complete market-ready self driving car.  You can build an AI to analyze data and provide insights in a small domain with a handful of skilled people working remotely.  Generally speaking, the capital investment will be smaller, and thus the barrier to entry is lower.

Open source democratizes AI

I don’t want to leave you with the impression that AI is easy.  It isn’t.  The biggest players in technology have struggled with it for decades.  Many of the hardest problems are yet to be solved.  On the individual level, anybody that has tried Siri, or Google Assistant or Alexa can attest to the fact that while these devices are a huge step forward, they get a LOT wrong.  Siri, for example, was never able to respond correctly when I asked it to play a specific genre of music.  This is a task that a 10 year old human can do with ease.  It still requires a lot of human smarts to build out fairly basic machine intelligence.

Why does open source matter more now than ever?  That was the title of this post, after all, and it’s taking an awfully long time to get to the point.  The short version is that open source AI technologies further lower the barriers to entry for the second category of automation described above.  This is a Good Thing because it means that the wealth created by automation can be spread across more people, not just those that have the capital to build physical robots.  It opens the door for more participation in the AI economy, instead of restricting it to a few companies with deep pockets.

Whoever controls automation controls the future of the economy, and open source puts that control in the hands of more people.

Thankfully, most areas of AI are already heavily colonized by open source technologies.  I’m not going to put together a list here, Google can find you more comprehensive answers.  Machine learning / deep learning, natural language processing, and speech recognition and synthesis all have robust open source tools supporting them.  Most of the foundational technologies underpinning these advancements are also open source.  The mots popular languages for doing AI research are open.  The big data and analytics technologies used for AI are open (mostly).  Even robotics and IoT have open platforms available.  What this means is that the tools for using AI for automation are available to anybody with the right skills to use them and a good idea for how to apply them.  I’m hopeful that this will lead to broad participation in the AI boom, and will help mitigate to a small degree the trend toward wealth consolidation that will come from automation.  It is less a silver bullet, more of a silver lining.

Image Credit: By Johannes Spielhagen, Bamberg, Germany [CC BY-SA 3.0], via Wikimedia Commons

Open Source is the Surest and Shortest Path to Digital Transformation

Back in 2013, Mike Olson, a co-founder of Cloudera, famously stated that “No dominant platform-level software infrastructure has emerged in the last 10 years in closed-source, proprietary form.”.  He’s absolutely right about that.  John Newton underscored this theme at a recent Alfresco Day event in NYC.  He shared this slide as a part of his presentation, which I think does a great job showing how much of our modern platforms are dependent on the open source ecosystem:

open-components-slide

Platforms are more open today than they have ever been, with a few exceptions (I’m glaring annoyed at my iPhone as I write this).  Quite a few companies seem to have figured out the secret sauce of blending open platforms with proprietary value-adds to create robust, open ecosystems AND be able to pay the bills in the process.  This is very good news for you if you are pursuing a digital transformation strategy.

Why open source and open standards?

The advantages of open source are pretty well established at this point.  Open projects are more developer friendly.  They innovate faster.  They can fork and merge and rapidly change direction if the community needs that to happen (although there are good and bad forks).  Open has become the de-facto way that the digital business works today.  I’d challenge you to find any team within your organization that isn’t using at least one open source project or library.  Open has won.  That’s the first big advantage of open source in digital transformation today:  It’s ubiquitous.  You can find a platform or component to fill just about any need you have.

Open is also faster to try, and removes a lot of friction when testing out a new idea.  Effective digital transformation relies on speed and agility.  It’s a huge advantage to simply pull down a build of an open source technology you want to try out, stand it up and get to work.  That allows bad ideas to fail fast, and good ideas to flourish immediately.  Since testing ideas out is effectively free in terms of dollar cost, and cheap in terms of time and cognitive investment, I think we also tend to be more willing to throw it out and start over if we don’t get the results we want.  That’s a good thing as it ultimately leads to less time spent trying to find a bigger hammer to slam that square peg into a round hole.  If you decide to go forward with a particular technology, You’ll find commercial organizations standing behind them with support and value added components that can take an open source core to the next level.

If digital transformation relies on speed of innovation, then open technologies look even more appealing.  Why do open source projects tend to out-innovate their purely proprietary competitors?  There are probably a lot of reasons.  An open project isn’t limited to contributors from only one company.  Great ideas can come from anywhere and often do.  At their best, large open source projects function as meritocracies.  This is especially true of foundational platform technologies that may have originated at or get contributions from tech leaders.  These are the same technologies that can power your own digital transformation.

Open source projects also make the pace of innovation easier to manage since you get full transparency of what has changed version to version, and the visibility into the future direction of the project.  Looking at pending pull requests or commits between releases gives you a view into what is evolving in the project so that you can plan accordingly.  In a very real sense, pursuing a digital transformation strategy using open technologies forces you to adopt a modular, swappable, services driven approach.  Replacing a monolithic application stack every cycle is not possible, but replacing or upgrading individual service components in a larger architecture is, and open source makes that easier.

Software eats the world, and is a voracious cannibal

There is a downside to this pace of change, however.  Because open source projects innovate so quickly, and because the bar to creating one is so low, we often see exciting projects disrupted before they can really deliver on their promise.  Just when the people responsible for architecture and strategy start to fully understand how to exploit a new technology, the hype cycle begins on something that will supersede it.  Nowhere is this as bad as it is in the world of JavaScript frameworks where every week something new  and shiny and loud is vying for developers’ limited time and attention.  Big data suffers from the same problem.  A few years ago I would have lumped NoSQL (I hate that term, but it has stuck) databases into that category as well, but the sector seems to have settled down a little bit.

There is also a risk that an open source technology will lose its way, lose its user and developer base and become abandonware.  Look for those projects that have staying power.  Broad user adoption, frequent commits and active discussions are all good signs.  Projects that are part of a well established organization like the Apache Software Foundation are also usually a good bet.  Apache has a rigorous process that a project must follow to become a full blown project, and this drives a level of discipline that independent projects may lack.  A healthy company standing behind the project is another good sign, as that means there are people out there with financial interest in the project’s success.

Simply using open source projects to build your platform for transformation is no guarantee of success, but I would argue that carefully selecting the right open components does tilt the odds in your favor.

Building an Open IoT Network in Birmingham. By the Users, for the Users

One of the big challenges in any IoT project is connectivity.  In a few proofs of concept and prototype projects I have worked on the choices have basically come down to either Wifi or 3G/4G connections.  Both are ubiquitous and have their place, but both also have significant drawbacks that hinder deployment.  Wifi usually requires access codes, has crap range, chews up battery and has FAR more bandwidth than most IoT projects really need.  3G/4G means a subscription or some kind of data plan and most carriers aren’t exactly easy to work with.  While platforms like Particle make this easier, it is still relatively expensive to send data and I’d like more choice in which embedded platform to use.  Are there any good alternatives?

Turns out there are, and one alternative in particular is appealing for the kind of open IoT projects that will drive us toward the future.  LoRaWAN is a Low Power Wide Area Network (LPWAN) specification governed by an open, non-profit organization that aims to drive adoption and guarantee interoperability.  With members such as Cisco, IBM and Semtech and an experienced board consisting of senior leaders from many of these same companies and others, the LoRa Alliance is well positioned to make this happen.  So that’s one possible standard but how does this enable an open IoT network?  How does it solve the problems laid out earlier and make some kinds of IoT projects easier (or possible at all)?

Enter The Things Network (TTN).  The mission of The Things Network is to create a crowdsourced global LoRaWAN network to foster innovation in much the same way as the early days of the Internet.  By deploying a free, open LPWAN, The Things Network hopes to enable innovators to build and deploy new IoT technologies that can change our communities.  That’s a mission I can get behind!  Check out their manifesto if you want to read about the full scope of their vision.

Our team seeks to built a Things Network community in the Birmingham, Alabama area.  We have already started reaching out to people across our metro in analytics, RF engineering, embedded systems, software development, entrepreneurship, community engagement / advocacy and government with the goal of building a consortium of local organizations to support a free and open IoT network.  Our vision is to build the open and transparent infrastructure required to support the future of smart cities.  Birmingham is a great place to do this.  The city center is relatively small so establishing full coverage should be achievable.  We have other smart cities initiatives in the works, including some things funded by an IBM Smarter Cities Challenge grant.  We have an active and growing technology community anchored by such institutions as the Innovation Depot, local groups like TechBirmingham and maker spaces like Red Mountain Makers.  We have active civic organizations with goals across the public sphere from economic development to air quality.  We have a can-do spirit and our eyes aimed firmly toward the future while being well aware of our past.

Assuming we can get a larger team assembled and this network launched, what do we plan to do with it?  A lot of that will come down to the people that join this effort and bring their own ideas to the table.  Initially the first few gateways will be launched in support of an air quality monitoring program using a series of low cost monitors deployed within the city.  Ideally this will expand quickly to other uses, even if those are just proofs of concept.  I, for one, plan to install a simple sensor system to tell me when the parking spaces in front of my condo building are available.  I hope others adopt this platform to explore their own awesome ideas and those ideas go on to inspire our city to become a leader in digital transformation.

I hope you will join us at the Birmingham Things Network Community and help us build the future one node at a time.