Adding a GPS to the Air Quality Monitor

I’ve gotten a little obsessed with this air quality monitoring project I’ve been working on, which tends to happen once I’ve been through the trough of sorrow and a project starts to look like it might actually work.  The last piece that is missing is the location of the readings.  It’s going to be hard to build up the kind of maps that this project calls for without it!  Once again, it’s Adafruit to the rescue with their awesome Ultimate GPS Breakout.  It was (and remains, as of the time this was written) out of stock on Adafruit’s website, but Robotshop and others have it on hand.  I’m growing very fond of Adafruit’s products, they all work as designed, have great docs, etc.

Before adding a new component into an existing project, especially something as complex as a GPS, it always pays to get it working in isolation.  In this case that means wiring up the GPS module to a spare Arduino Uno.  A tip for people just getting started and switching between boards:  make sure you change the port and board type to match what you are actually using!  When I switched from the Mega used to actually run the monitor to the Uno for testing I forgot this step and momentarily thought the Uno was dead.  That’s the kind of thing that happens during late night coding sessions when your brain isn’t quite working right.  After fixing this little issue the GPS powered up quite happily and quickly got a fix.  It only takes four wires to get the Ultimate GPS Breakout up and running.  Just add power, ground and a serial connection.  Simple, right?  The Adafruit product tutorial for Arduino covers all this, and how to use the provided sample sketches to prove it works.

The next step was to get this module integrated into the existing project.  A little easier said than done due to a lack of real estate on the perfboard.  It took some rework and creative wire routing, but it’s in there.  When the board is installed in its enclosure the antenna will not be correctly oriented, but it is good enough for testing outside the enclosure.  The final version will use an external antenna mounted on top of the enclosure anyway so that’s not a big problem.  After wiring it up and updating the monitor’s sketch to use the GPS and send data to our collection server, it could not get a fix.  Moving it outside, reorienting the antenna and changing the serial port had no effect.  After several hours the GPS would just sit there and blink at 1Hz (indicating no fix) and no location data was coming to the collection server.  What’s the problem?

A little digging around in the docs and on the internet indicates that this module is particularly sensitive to noise in its power supply.  Without an oscilloscope it’s impossible to say for sure how noisy the supply is.  A little filtering couldn’t hurt anything, so I added a few decoupling capacitors to the supply lines.  Did it work?  No.  No it didn’t.  After over an hour the GPS had still failed to obtain a fix.  Back to the drawing board.  I tested the module with the Mega directly and got it working nicely by switching from a software serial port to a hardware port on Serial3.  Great, except for the fact that the same sketch wouldn’t work in the monitor itself.  It only worked if I isolated the Arduino Mega and GPS module on a breadboard.  This result indicated that something else on the air quality monitor board was the cause.

I began to wonder if another component on the board was interfering with the GPS.  Due to space constraints, the GPS module was right next to a WINC1500 Wifi module and a 5v regulator that powers the heaters in the gas sensors.  Could one of these be the cause?  The easiest way to test this is to move the components further apart.  I used a few jumper wires to move the GPS off of the board (about 10 inches away) and the GPS acquired a fix almost immediately.  Progress!  Now it was just a matter of narrowing down exactly which component was the culprit.  Simply for convenience I removed the WINC1500 module first since it wasn’t screwed down.  Again, the GPS acquired a fix almost immediately.  Looks like that is the cause.  The GPS must be getting some kind of interference from the Wifi module.  Lesson learned, these components apparently need some distance if they are to be used in the same project.  This is likely due to the patch antenna built into the Ultimate GPS breakout being right next to the WINC1500 breakout.  Hopefully moving to the external antenna that will ultimately be needed anyway will fix the problem.  If not, remote mounting the GPS module will.

Alfresco and Solr – Search, Reindexing and Index Cluster Size

A question came up from a colleague recently, driven by a customer question.  When is it appropriate to increase the number of Alfresco Index Servers running Solr?  The right direction depends on several factors, and what exactly you are trying to achieve.  Like many questions related to Alfresco architecture, sizing and scalability the answers can be found in Alfresco’s excellent whitepapers on the subject (full disclosure and shameless self promotion: I wrote one of them).  Not only are there multiple reasons why you may need to scale up the search tier, there are a couple different ways to go about doing it.  Hopefully this article will help lend a little clarity to a sometimes confusing topic.

A place to start

A typical customer configuration starts with a number of index servers that roughly matches the number of repository cluster nodes.  The index servers sit behind a load balancer and provide search services to the repository tier.  Each index server maintains its own copy of the index, providing full failover.  It’s common to see a small to medium enterprise deployment running on a 2X2 configuration.  That is, two repository tier servers and two index servers, the minimum for high availability.  As the repository grows, user patterns change or the system is prepared for upgrade, this can prove insufficient from a search and indexing point of view.

Large repositories

When Alfresco first ran our 1B document benchmark we set a target of about 50M document indexed per index server.  So, for our 1B document environment, we had 20 index shards each containing ~50M docs.  Our testing shows that the system gives solid, predictable performance at this level.  For large repositories, this is a good starting point for planning purposes.  If you have a lighter indexing requirement per document (for example, a small metadata set or no full text indexing) it is likely possible to go higher.  Conversely, if your requirements call for full text indexing of large documents and you have a large, complex metadata set, a smaller number of documents per shard is more appropriate.  Either way, as a repository grows larger at some point you will need to consider adding additional index servers.  As with all things related to scale, take vendor recommendations as a starting point, then test and monitor.

Heavy search

Some ECM use cases lean heavily on search services.  For these cases it makes sense to deploy additional index servers to handle the load.  Spreading search requests across a larger number of servers does not improve the single transaction performance, but does allow more concurrent searches to complete quickly.  If your use case relies heavily on search, then you may need to consider adding additional index servers to satisfy those requests.  For this specific case, both sharding and replication can be appropriate.  Both sharding and replication allow you to spread your search load across multiple systems.  So how do you choose?  In most cases sharding is the better option.  It is more flexible and has additional benefits as we will outline in the next section.

If your repository is relatively small (less than 50M documents or so) and you are primarily concerned with search performance, replication can be a good option.  Replication sets up your index servers so that only one is actually tracking the repository and building the index.  This master node then replicates its index out to one or more slaves that are used to service search requests.  The advantage of this configuration is that DB pressure is reduced by only having one index server tracking the repository, and you now have multiple servers with a copy of that index to service search requests.  The downside is that it has a relatively low upper limit of scalability, and introduces a single point of failure for index tracking.  Not such a huge problem though, if the tracking server stops working you an always spin up another and re-seed it with a copy of the index from the slaves.  A replicated scenario may also increase the index lag time (time between adding a document and it appearing in the index) slightly since it must first be written to the master index and then replicated out to the slaves.  Real world testing shows that this delay is minimal, but it is present.

Reindexing and upgrades

There is another case where you may want to consider adding additional index servers, and that is when reindexing the repository or upgrading to a new version of Alfresco.  Alfresco has supported multiple versions of Solr over the years.  Alfresco 4.x used Solr 1.4, 5.0/5.1 use Solr 4, and the upcoming 5.2 release can use Solr 6.  Newer versions of Solr bring great new features and performance improvements, so customers are always eager to upgrade!  There is one thing to look out for though:  reindexing times.  Switching from one version of Solr to another does require that the repository be reindexed.  For very large repositories this can take some time.  This is where sharding is especially helpful.  By breaking the index into pieces (shards) we can parallelize the reindexing process and allow it to complete more quickly.  The less documents an individual shard reindexes, the faster it will finish (within reason, 10 doc per shard or something would be ridiculous).  So if you are considering an Alfresco upgrade and are worried about reindexing times, consider additional index servers to speed things along.  Note that most Alfresco upgrades do not require you to switch versions of Solr immediately.  You can continue to run your server on the old index while the new index builds, but during this time you cannot take advantage of Alfresco features that depend on the new index.


This list is by no means comprehensive, but it does outline the three most common reasons I have seen customers add additional index servers.  Have you seen others?  Comment below, I’d love to hear about it!

Air Quality Monitoring, Phase II: What To Do With The Data?

In a few recent blog posts I’ve laid out an air quality monitoring project that has materialized in my spare time.  If you want to start from the beginning the first and second posts about the project lay out the inspiration and goals, and the hardware that takes the measurements.  A few people have asked “OK, so you have this sensor package capturing decent quality data, now what are you going to do with it?”.  Good question, and it has several answers.

Personal questions

The genesis of this project was the desire to test some assumptions that we have had for a long time about air quality in Birmingham, where the pollution comes from and how that has affected where people choose to live.  Hopefully the data captured will be sufficient to do relative comparisons of different parts of the Birmingham metro area.  If so, it should be possible to test these assumptions and determine if the pollution in the Jones Valley is actually worse than it is over the mountain, if elevation up in the mountain area itself has an effect, and how far out from the city you have to go before the effects of industry aren’t as apparent.  I’d like to know before we move!

Community awareness

One of the most obvious uses for a system like this and the data it generates is community awareness.  How bad is the problem?  What can we do to fix it?  What communities are the most impacted?  Do the pollution measurements correlate with other data such as demographics or proximity to specific types of industry?  There is a lot of geo data out there that shows income, education, economic activity, etc.  It will be interesting to overlay the air quality measurements with this other data to see if there are any correlations.  Information is empowering.

Incentivizing change

Citizen science projects are great for engaging with the community, and can help drive change.  One particularly inspiring example is a project in the Netherlands that is providing free Wifi when air quality goals are met.  In phase II, this project will adopt a similar direction with a few tweaks.  The current plan is to tie the sensor array to a system that will provide free Wifi for people in range when the air quality is good, or at least use a captive portal system to show people the current readings as a part of a free Wifi system that is running at all times.  That same captive portal page will also contain links to make strategic donations or renewable energy credit purchases so users can take direct action.


In recent years there has been a huge emphasis on early STEM education.  Whether or not we have a shortage of skilled technical workers is up for debate, but regardless of the truth of it a lot of attention is being paid these days to giving kids the foundation skills.   This little project could be a great introductory program for basic electronics and air chemistry.  Since it is all being developed in the open, using open software and open hardware, there are few barriers to using a project like this in an educational setting.  It could be especially fun to use this sort of project in a cross disciplinary educational approach.  For example, running the ozone sensor over a long period of time alongside plants that are known to be sensitive to ozone could be an interesting way to tie together electronics / software and life sciences in a way that is easy for students to understand.


While these uses for the project and its data are interesting, what will be the most fun is watching what else shakes out as we move ahead.  Discussing interesting ideas in public is probably the #1 reason why I rebooted my blog, and I can’t wait to hear what else people come up with!

Building the DIY Air Quality Monitor – BOM and Pictures

A few days back I posted a bit about a DIY air quality monitoring project I’ve been working on.  That post just outlined why the project started, what we hoped to achieve, the high level design, component selection and software stack.  In the last few weeks the project has moved past the breadboard stage into a real physical prototype.  It’s a little ugly, built on generic perfboard and full of design compromises, but it works like a charm.  Now that it’s working, it’s time to share what went into the build, the bill of materials, and a few pics!

Top of board

On the top of the board I have mounted the Wifi module, the MQ series sensor array, a 6 circuit Molex connector for the particle sensor, a 5V regulator and the Arduino.  The Arduino mates to the perfboard using a bunch of 0.1″ male pin headers.  You can think of the perfboard as just a ridiculously, comically oversized Arduino shield.  The WINC1500 mounts the same way.  The MQ sensor breakouts use a right angle female pin connector.  Why so many connectors?  I like to reuse stuff.  With the way this is put together it is easy to pop modules on / off of the board to be reused in other projects or replaced if they stop working.  One thing to note here is that the Arduino uses wacky spacing for one of its sets of headers.  The spacing between pins 7 and 8 is not the standard 0.1″.  Why?  Who knows, seems silly to me.  I didn’t need one of those sets of headers, so I just left the problematic one with the weird spacing off of the board.

A word on power in this design.  The MQ series sensors all have internal heaters that are required to keep them at the right operating temperature.  These heaters need regulated 5V.  They consume more power than the Arduino has available from its 5V output.  So, 7.5V is fed into the regulator, which provides regulated 5V output for the MQ sensors and the DHT-22.  The same 7.5V is fed into the Arduino’s vin pin, which powers the Arduino itself.  The other low power items (WINC1500, Sharp particle sensor) are driven off of the Arduino’s regulated 5v output.  While it is possible to run the 7805 regulator without a heat sink for low current loads, my total load was high enough that it needed one.


Bottom of board

On the back of the board we find the Sharp GP2Y1010AU0F sensor, the DHT-22 temperature and humidity sensor, and a bunch of ugly globs of solder.  The one thing I don’t like about the Sharp sensor is the lack of real mounting holes.  It does have some little rails, so I used some short plastic standoffs and nuts to sandwich that rail and provide a secure mount.  The Sharp sensor’s data sheet prescribes a specific orientation for mounting.  Once this board is slid into its housing and the housing is stood up, the sensor will be oriented correctly.  The DHT-22 sensor is also mounted on this side of the board.  Why is that when it looks like there is plenty of room to mount it next to the MQ sensor array?  Recall that the MQ sensors have heaters in them.  The first iteration of this board had the DHT-22 right next to the MQ sensors on the other side.  When the MQ sensors came up to temp, the DHT-22 was consistently reading 10-15 degrees higher than it should have been.  Moving the sensor to the other side of the board seems to have corrected that.


Enclosure bottom

This type of enclosure is a pretty standard thing for sensors that live outside.  A louvered radiation shield keeps the rain and sun off of the bits and pieces.  Strong driving rain would probably still find its way in and I don’t want all these parts getting wet, so it will be mounted up under a covered area where it will stay nice and dry.  This particular shield enclosure was designed for an Ambient Weather temperature sensor but it works great for this project.  Luckily at its widest it was just a few mm narrower than the perfboard I used for the project.  Cutting a few little notches in the sides of the enclosure allows the board to easily slide in and out like it was designed to be there.  The oval shape of the enclosure cavity made part layout a little tricky.  The taller parts like the MQ sensors and dust sensor needed to be kept toward the middle so they would have enough clearance.


Bottom plate installed

The bottom plate has a standard power jack that is used to supply the system with power, and three status LEDs that show the state of the system.  Red means that there has been a fault detected and the system has halted.  When that occurs it will restart in 8 seconds or so when the watchdog timer kicks in.  The yellow light signals that the system is starting up, connecting to the network and taking test readings.  Green means that everything has started up and the monitor is successfully connected to the network.  It’s a bit crude, but does a good enough job to indicate the system status without having to hook up a USB cable and look at the serial monitor output.


All buttoned up!

The perfboard slides right in, and then the bottom two louvers are attached to the threaded posts with wing nuts.  Looks nice and clean once it’s all put together and you can’t even see that ugly board.


The posts on top of the enclosure are used to attach it to an L bracket included with the enclosure.  This bracket also comes with some U bolts that make it easy to mount the whole assembly to a pole.


Here’s what was used to build the monitor.  If you have been hacking around on electronics for a while you probably already have some of this stuff just laying around.  If you haven’t and you don’t have a good stock of bits and pieces, now’s a good time to order extras of stuff you know you’ll use a lot.  Passives, wire, that ubiquitous 0.1″ male pin header strip and of course you can never have too many LEDs!

I’ve put the wire in the BOM as well, even though it’s not strictly necessary.  I like the pretinned 24AWG stuff for laying out power and ground busses on one side of the board because it’s easy to solder it down to the pads on the board as you are routing it around. 22AWG stranded wire is good for connecting the board to stuff mounted on the enclosure (like the power jack) where some flexibility is needed.  The 30AWG insulated wire wrap wire is good for signal connections.  The 30AWG wire wrap wire is fragile though, so after I have it all working I like to tack it down with a dab of hot glue.  If you have to do some rework, the hot glue peels off easily enough.



  • 1x 10μF 25v electrolytic capacitor
  • 1x 1μF 25v electrolytic capacitor
  • 1x 220μF 25v electrolytic capacitor
  • 1x 150 Ω resistor
  • 1x 10k Ω resistor
  • 1x  each green, yellow and red LED




Moving from the breadboard to a real prototype was pretty straightforward.  With a good schematic and lots of pictures of the working breadboarded design translating it to the perfboard was mostly a matter of laying out the components in such a way that they fit neatly in the enclosure.  The schematic and Fritzing diagram will be added to the Github project shortly, they just need a little cleaning up.  If you are going to try a project like this, make sure you have your enclosure and bare board in hand at the same time or you may find that things don’t fit like you expected them to.  I was expecting the enclosure I ordered to have a larger cavity, for example.  Now that the monitor can be moved around without wires popping out everywhere we can move on to some real world testing, data collection and analysis.

Overview of Monitoring and Mapping Air Quality with Arduino, Node, Elasticsearch and Kibana

Almost anybody that lives in a decent sized metro area probably has air quality concerns.  My city has seen a lot of improvement in the past few years, but we still rank as one of the worst in the country for year round particle pollution, and we still have periodic ozone alerts throughout the summer.  The local wisdom says it is especially bad in the “bowl” of a valley in which the city sits. Shortly after Birmingham was founded, it grew rapidly as a steel town.  During this period of growth, the pollution in the Jones Valley was terrible.  According to many, this pollution led those with money and mobility to move up, and eventually over, Red Mountain.  In the 1970s the situation was so bad that a federal judge invoked the first ever use of the Clean Air Act’s emergency clause to have smokestack industry temporarily shut down.

Fast forward to 2017, and I’m looking for a new home.  While driving around house hunting with my incredibly patient spouse, we started talking about the advantages of moving from our loft downtown to the “Over the Mountain” neighborhoods and air quality came up as one potential benefit.  We’ve both heard how bad it used to be, and that better air may have been one of the reasons people moved where they did.  But, is it true?  Is the air quality in these neighborhoods any better?  Should we even factor that into buying a home in the same general metro area?  My wife is a scientist, and we share a desire to make good decisions with data.  Unfortunately, the data we have available on air quality in our metro area isn’t granular enough.  There are few sensors, and they are quite far apart.  This setup is good enough to get regional measurements, but not good enough to test some of our more localized assumptions.  To get what we want, we’re going to have to measure it ourselves.  Commercial measurement equipment is fairly expensive, and doesn’t easily lend itself to automated capture and logging.  Lucky for us, I’ve been having fun hacking on the Arduino platform lately, and there are a ton of great sensors for measuring different aspects of air quality out there for cheap!

Project goals

No project should start without an end in mind.  For this project, the goals are fairly simple:

  1. Measure the most common components of air pollution as accurately, precisely and discretely as possible
  2. Capture the time and location of the measurement
  3. Reliably record all of the measurements in a way that lends itself to easy analysis
  4. Be able to install as a fixed installation to measure trends over time in a single location
  5. Portable enough to take on the road to measure data in many locations in a short period of time
  6. Be able to take the data and use it to drive informed decisions
  7. Learn a few new things about air pollution, Arduino programming and electronics

The core platform

Logically, there are two separate parts to this project.  The first is a sensor platform that needs to be able to connect to the sensors, read data from them, and then send it somewhere.  The second part of the project is an aggregation and analysis platform that collects the measurements, stores the data and provides a way to use it to get useful insights.  This logical separation frees us to use tools that are well suited for each part of the job.  This simple little diagram shows all the pieces and how they fit together:


For the sensor platform I chose an Arduino Mega.  Why a Mega?  Well, there was one sitting in my parts bin, for one.  The Mega has plenty of IO to support a lot of sensors and other modules, and plenty of flash for a big program.  The Arduino Mega has more than enough processing capability to read from sensors and push data over a network, while still being low power enough to potentially run off of a solar panel or battery.  This is important to meet the project goals.  There are plenty of other devices in the Arduino family that could work just fine.

The aggregation and analysis platform was developed on my Macbook, but the final deployment target is a Raspberry Pi 3 rev B.  The Pi can easily run the software stack, is also low power so I don’t feel bad leaving it running for extended periods, and has enough GPIO pins to drive some other devices that will respond to changes in the air quality data.  One thing to keep in mind is that the Pi is an ARM7 based device, so anything that gets built on another platform needs to be portable to ARM7.  Another point in the Pi3’s favor is the built in Wifi.

Sensors and measurements

If you look at “real” air quality measurements from government agencies or other organizations, you’ll see a few things that most of them measure.  Particulate matter, ozone, carbon monoxide, VOCs and others are common to see in measurement suites, along with temperature and humidity which affect particle and compound formation.  I don’t expect to get absolute PPM / PPB measurements on par with professional gear, but for this use case that isn’t necessary.  A relative measurement that can be used to compare measurements taken in different areas is good enough.  A little digging on the internet led to a set of sensors that should get measurements that are good enough to compare different areas of town.  This list is by no means comprehensive, but these are the sensors chosen for this project.


The MQ-131 sensor measures ozone in the atmosphere.  Ground level ozone is known to cause a variety of health problems, and is a regular source of air quality alerts in urban areas.  Since my metro area sees multiple ozone alerts each summer, this is one that definitely should be measured.


The MQ-135 sensor is a general air quality sensor that is sensitive to smoke, NOx, CO2, benzene, alcohol and others.  It does not differentiate well, but for the purpose of this experiment a relative measurement of miscellaneous stuff you don’t want to breathe in is good enough.


MQ-7 sensors measure carbon monoxide.  CO is not something you want to breathe in, and is regulated by the EPA.  CO is most worrisome in enclosed indoor environments, but can also be a concern outdoors for sensitive populations.

Sharp GP2Y1010AU0F

Sharp’s GP2Y1010AUoF sensor measures particulate matter in the air using an LED to reflect light off of particles which is then measured by a photosensor.  Fine particles are a known source of health problems.  Typical air quality measurements look at two types of particles;  those that are < 2.5 microns in size, and those that are < 10 microns.  As far as I can tell the Sharp sensor isn’t sensitive enough to differentiate between the two classes, but for this experiment a total measurement of particulate matter is sufficient.


The DHT-22 is a simple temperature and humidity sensor.  Both temp and humidity can affect the formation of particles and compounds in the air, so it’s worth measuring for use in the air quality calculation.  Measuring temperature will also allow this experiment to map out the urban heat island effect.  This sensor can only be read every 2 seconds or so, but for this project that’s OK.

Connecting to the network

With the sensors selected, connected and happily spitting out data, it’s time to do something with it!  I’d like the sensor array to be located independent of the collection server, which means the sensor array should probably be able to send its data wirelessly.  The first iteration of this project used a CC3000 based Wifi shield.  If you are looking for a reliable wireless connection for your Arduino project, this is NOT it. Nothing against Sparkfun, it looks like a chip level issue and not anything in their design or library.  The CC3000 locked up so often that a watchdog timer became necessary to keep the system running for more than an hour at a time.  At its worst I measured over a hundred watchdog driven resets in a 24 hour period.  No bueno!  Switching to a WINC1500 breakout board from Adafruit helped immensely and now the monitor can run for days without a restart.

Inside the Arduino sketch’s loop() function each sensor is read in turn.  The data is then formatted as a JSON string, ready to be sent to the collection service.

Collecting and analyzing the data

Now that the sensor array is hooked up to the network and generating data, how can it be collected and analyzed?  This turned out to be the simplest part of the project thanks to node.js and Elasticsearch.  The sensor array formats the data as JSON and sends it along to a simple REST API built on node.js and Express.  This API in turn saves the data out to Elasticsearch for analysis.  The combo of node.js and Elasticsearch is lightweight and quite powerful.  Using Elasticsearch also gets Kibana as part of the deal, providing an easy way to search, slice and graph the results.  Simple and effective.  Kibana is great for displaying the graphs, but I also wanted to show running averages, minimums and maximums in an easy to digest way.  Another REST API provided by node.js accomplishes this using Elasticsearch aggregations.  That API is called by a simple web front end built in Angular.  Finally, I feel like this project is fully buzzword compliant!  Here’s what it looks like.  Not the prettiest thing around (yet), but it is effective!


The code

All of the code used for this project is available on my Github. Right now it is just a dump of the artifacts including the Arduino sketch and a simple node.js + Angular application to get aggregate statistics on the data. In the future this will include Fritzing diagrams and perhaps schematics and board layouts.  Oh, and some useful docs that actually explain everything!

Next steps

The next step for this project is to get a GPS hooked up so location stamps can be added to each reading.  As soon as the GPS is working, we’ll take the sensor array for a long drive around town capturing readings as we go.  Then it should be straightforward to take all of the captured data and map it out to see if our assumptions about air quality across Birmingham neighborhoods were correct or not.  Mappable data also opens the door to other interesting analysis, such as correlating readings with industry, income, etc.


Taken together the Arduino, node.js, Elasticsearch and Kibana provide all the tools an amateur citizen scientist needs to take some basic relative air quality measurements.  I hope others find this useful and share their improvements and/or data with others interested in what they are breathing.  In the near future I will publish my findings along with more detailed dives into sensor calibration and other related topics.  In the meantime, happy hacking!



Getting Started with an Activiti and IoT Example

In the second article in this series, we discussed several IoT protocols that might be a good choice to use with Activiti, and put down some questions that should be answered to guide the selection.  In this post, we’ll discuss the example use case, get answers to our questions, briefly introduce the hardware (covered in depth later), discuss the protocol stack, and the interaction with our BPM engine.

The Use Case

It’s hard to have a meaningful conversation about the interplay between IoT and Activiti without a concrete example.  For the purpose of this series, we’ll look at something basic:  A water sensor with an audible / visible alarm, and a remote shutoff valve.  The interaction between the components is fairly simple.  Water where it shouldn’t be causes an alarm to sound and a message to be sent.  This message triggers a remote shutoff valve to turn off, and notifies maintenance that there is a problem.  When maintenance accepts the task, the alarm is silenced.  Finally, when maintenance completes their task indicating that the leak is fixed, the remote shutoff valve turns the water back on.  The shutoff valve also needs to send an acknowledgement that the water is turned back on to complete the process.  The messages will originate from Arduino devices, automated and human tasks, and all of the activities will be coordinated by an Activiti process.

This simple example process encompasses both M2M (sensor -> shutoff valve) communication and M2P (sensor -> staff, staff -> shutoff valve) communications.  It also requires interaction between the process and IoT devices at multiple levels.  Our water sensor needs to be able to start a process instance.  Within that instance, we have tasks that need to send messages back to the sensor (to silence its alarm) and tasks that need to send messages to the shutoff valve (to turn the water on / off).  Looking back to the last post in this series, we had a few questions that can guide our early design decisions.  Now that we have a use case basically defined, we can get some answers:

  1. Do we just need to start a process, or do we need to be able to interact with tasks as well? – Both, for this case.
  2. Do we need Activiti to be able to push messages back to our device? – Yes, we do.
  3. Is there more than one type of device that may interact with a given workflow? – Yes, both water sensors and shutoff valves.
  4. Is there more than one type of workflow that may be started from a given class of device? – Not at this time.
  5. What triggers the interaction with the process?  Is it a single condition?  Is it a pattern? – A single condition will trigger the process start.
  6. How reliable does this need to be?  Is some loss or delay of messages tolerable? – We need a guaranteed message delivery.

Making Some Choices

Based on the answers to the questions above, it’s clear that we need a protocol that can guarantee message delivery and allows easy bidirectional communication between in-flight processes and devices.  The hardware in question is going to be a couple Arduinos; one for the sensor/alarm, and one for the valve.  The Arduino isn’t the most powerful device so something lightweight is in order.  MQTT fits the bill quite nicely.  It also helps that there are a few good MQTT client libraries available for the Arduino.  In addition to the client, MQTT also requires a broker to manage topics, clients and message delivery.  The Eclipse Mosquitto broker supports MQTT 3.1.1, is open source (keeping with the open theme of this project), and is under active development so it makes a good choice.  The final piece of the puzzle is missing and needs to be built:  We need a link between Activiti and MQTT.

Connecting Activiti and MQTT

Activiti is built on Java + Spring.  While it is possible to build some kind of a standalone gateway server to act as middleware between Activiti and Mosquitto in just about any language, the best impedance match here is to implement it in Java as an Activiti extension.  Like with the broker, it’s Eclipse to the rescue with the Paho project.  Paho offers MQTT client libraries for a number of languages including Java, Python, JavaScript, Go and C#.

There are two components missing from the picture so far.  Both depend on having an MQTT client running that is connected to our broker and subscribed to the necessary topics.  The first is a component that listens to MQTT for a new message that indicates a water leak.  This component also needs to know how to talk to Activiti so it can start a process instance. The second piece is a component that listens for Activiti events that need to be translated into MQTT messages and publishes them to the appropriate topic.  For the sake of simplicity, both of these components will be wired into Activiti as Spring beans, taking advantage of the hooks built into Activiti.  A bean that uses the process engine configuration hook point will start the connection to the MQTT broker to listen for messages that should start a process or affect tasks in an existing process, and a bean using the process engine event listener hook point will listen for the right Activiti events and pass those back through to the MQTT broker.

The extension described above can be generalized a bit.  XMPP and other IoT messaging protocols can be connected in much the same way.  It therefore makes sense to build a framework for connecting IoT pub/sub protocols to Activiti instead of just building a single integration for a single protocol.  This first example isn’t going that far, but the initial design should keep a framework in mind and once MQTT is working we’ll explore it further.

In the next article in this series we’ll (finally!) show the code, play around with a test MQTT client and see how we can actually start processes from MQTT messages.


Activiti and IoT, Choosing the Protocol Stack(s)

In part 1 of this series, we briefly discussed why it makes a ton of sense to use BPM to manage processes in an IoT world.  The pieces are all there, but how, exactly, can we make this work?  The answer to that question is, of course, “it depends”.  What kind of device are you connecting?  What sort of messages does it send or receive?  How much, if any, control do you have over these factors?  If you are building something from scratch with an Arduino or Raspberry Pi then it’s likely you’ll have a lot of control over what the device sends, how often, and what protocol it uses.  If you are deploying an off the shelf device your options will likely be much more limited or simply nonexistent.  A good place to start is to look at what kind of data you are working with and what protocol is used.  In my experimentation, I have been working with three protocols that are widely used in the IoT world: HTTP(S), XMPP and MQTT.


HTTP is a beautiful thing.  It’s easy to implement, easy to test, flexible, ubiquitous and mature.  You can use it to send any kind of data you want.  It has a nice set of verbs that map to the kinds of operations we might need.  HTTP is supported everywhere, with robust client and server implementations available in any language on any platform.  Done right, HTTP is extremely scalable.  Looking at the negatives, HTTP is pretty heavyweight when compared with some other IoT options and doesn’t offer multiple levels of quality of service.  This makes it reliable but that reliability is not always a requirement.  It’s also based on a request / response model which isn’t always a good fit for IoT applications.  If you want a long running Activiti process task to send messages back to your device it will need to be both a HTTP client and server, which is a lot to ask of a very low power device on an unreliable network.


XMPP was originally developed for instant messaging and presence applications as the protocol used by Jabber.  In subsequent releases it evolved into an IETF standard.  Today, XMPP is well defined but not stagnant thanks to an active community.  There are multiple client and server implementations across many platforms and languages.  The message format is flexible and extensible via XML.  XMPP supports a publish / subscribe model (via an extension), presence and a fairly rich set of features.  It is generally lighter than HTTP, but is still weighty on very low powered devices due mostly to the fact that it is a text based protocol.  XMPP, like HTTP does not support multiple QoS levels.  This makes it suitable for IoT devices that have a fair amount of power and need reliable message delivery.


MQTT is the lightest of the protocols that I have been researching.  It is an OASIS standard supporting a publish / subscribe model, and has a number of client and broker implementations available.  MQTT isn’t quite as widely used or supported as either HTTP or XMPP, but it is mature and stable.  MQTT is a binary protocol designed for constrained devices operating on unreliable networks, which means it is very well suited for IoT applications like remote sensor networks.  Support is included for multiple levels of QoS ranging from “fire and forget” to guaranteed delivery.  It isn’t great for large amounts of data, and is not as flexible or extensible as either HTTP or XMPP.


There are a huge number of protocols for IoT devices both open (to varying degrees) and proprietary.  For my purposes I’m not even going to look at the proprietary protocols because proprietary protocols are dumb.  The real value of IoT can only be realized through openness and integration, not through secrecy and walled gardens.  Some other open protocols that may be worth a look depending on your needs include AMQP and STOMP.  These may get their own articles in the future, but for now I’m going to focus on the three described above.

Connecting to Activiti

Assuming we now have a device and a use case in mind, we know what its data and delivery reliability requirements look like and we know what protocol we are going to use, how can we get this thing talking to Activiti to either start a new process or interact with one that is already in-flight?  Again, that depends.

At its simplest, your IoT device could simply call one of several REST APIs in Activiti to start a workflow.  You could call one of the workflow initialization APIs directly to start a workflow using a message or a process definition.  This sort of simple point to point integration works, but it is hard to scale without excess complexity, hard to maintain and brittle. A better option, if you are going the HTTP route in your IoT project, might be to push the message onto some kind of bus.  Either way, if your device speaks HTTP, getting a basic connection to Activiti that you can use to start a workflow or interact with tasks is straightforward.  What about other options?

Using another protocol with Activiti will require some sort of bridge or gateway since Activiti doesn’t speak MQTT or XMPP natively.  Each of these protocols uses a different architecture and will require a different approach.  Even so, we can map out some high level requirements by asking a few questions about what we are trying to do with our IoT project:

  1. Do we just need to start a process, or do we need to be able to interact with tasks as well?
  2. Do we need Activiti to be able to push messages back to our device?
  3. Is there more than one type of device that may interact with a given workflow?
  4. Is there more than one type of workflow that may be started from a given class of device?
  5. What triggers the interaction with the process?  Is it a single condition?  Is it a pattern?
  6. How reliable does this need to be?  Is some loss or delay of messages tolerable?

In part 3 of this series, we’ll outline a simple use case, introduce our example IoT device and code, answer the questions above for our application and get a basic gateway working.

IoT, Activiti and the Workflow of Things

I’m not going to waste much space here writing about the broad, general impact of the Internet of Things.  Countless column inches have already been spent detailing the number of devices, the amount of data and the potential size of the market.  We get it, it’s huge, still growing and transformative.

One of the most interesting facets of the Internet of Things is the complex flows that result from simple events.  A single out of spec sensor reading can trigger a cascade of machine and human actions.  Deviations from known patterns of data might mean nothing, or may signal some kind of catastrophic event.  How can we tell the difference?  How can we orchestrate a response to conditions that are signaled by our IoT data streams in a consistent way whether or not human intervention is required?  How can we look at these responses in aggregate and find ways to handle them more efficiently, find opportunities for further automation or discover exceptions that our current process does not cover?

This is not a new problem.  We have been dealing with the challenge of coordinating activities triggered by signals or messages at scale for years.  While the source of the messages (high volume IoT event streams) may be new, the techniques for responding to them are not.  We can leverage many of the same tools and patterns that have been used successfully in other spheres to coordinate actions that arise from IoT events.  Specifically, we can take advantage of a scalable, high performance workflow engine to consume the output from IoT devices, decide what messages or message patterns indicate an action is required and then execute a process in response.  The beauty of this approach is that it gives us a clean separation between the underlying data and the process design, using tools and concepts that are already well understood by business and technical users alike.  Our process engine can intelligently interact with other IoT devices using automated workflow tasks (covering many M2M use cases) as well as tasks that require human intervention (solving for many M2P use cases) or both in the same process.  Finally, this approach gives us access to detailed analysis of both in-flight and completed processes without having to reinvent anything.  In short, IoT and BPM seem destined to connect and in some ways, converge.

This is just the first in a series of articles.  Over the course of the series we’ll explore this idea in more depth, discussing how a number of IoT protocols can play in the BPM world, how we can structure bidirectional communication between our process engine and IoT devices, where we may need new components, and how to use the insights BPM analytics gives us to make sense of trends in our IoT data.  When we need to build specific examples, we’ll make use of Alfresco Activiti. Activiti is well suited for this kind of thing.  It is lightweight, super fast and scalable, and comes with analytics baked in.  Most importantly, it is open source and easily extensible which we’ll need to build some of our examples.  It’s a perfect fit.

Stay tuned for the next article in this series, “Activiti and IoT, Choosing the Protocol Stack(s)“.

Rebooting the blog

I’ve realized over the last year or so that I miss writing.  I miss sharing the stuff I build with other people, and especially miss the insights that come from public conversations.  So with that in mind, I’m relaunching my blog!  Let’s kick it off with a few answers to questions that nobody is asking:

Why did you stop blogging?  Well, I stopped because maintaining my own site was taking up too much time.  One missed Drupal security patch and the site was overrun with malware and mail relays, requiring days of cleanup and downtime.  Creating content and building new things is a lot more fun than dealing with the aftermath of a hack, or database upgrades, or applying security patches for hours on end.  Lesson learned:  Use a managed platform.

What about all your old content?  There is a static copy of the old site kicking around which may or may not get posted in the future if people ask for it.  All of my project code remains posted on my Github, and that’s where new code will land in the future.

What are you going to write about now?  I still work for the world’s coolest ECM/BPM company, so you can expect to see a lot of Alfresco related articles.  In the time that has passed since my previous blog burned to the ground I’ve taken on a new role in the company.  In keeping with that new role a lot of the new content will be focused on customer success, service delivery and the like.  But, a new role doesn’t mean core interests have changed!  You can still expect to see a lot of how-tos, best practices and deep technical content from the ECM/BPM world.  Maybe even a guest article or two if any of my immensely talented friends and colleagues can be convinced.

So just work stuff?  That’s it?  Of course not!  Work is a huge part of all of our lives, but that doesn’t mean that’s all there is.  I’ll also be regularly blogging about IoT using the Raspberry Pi / Arduino (my current obsession), whiskey, travel, science, technology, food, sustainability and anything else that seems interesting at the time.  In other words, expect a heavy dose of stuff that shakes out of a person that has a short attention span.