Process Notification with PushMon

Notice: This is a discussion of process notification that includes a review of the PushMon process notification product. At the time of this writing, I am using the free beta of PushMon. I have no financial interest in the company nor am I receiving any compensation. I can’t tell you whether PushMon is a good solution for your needs or not, so do your own due diligence.

If you are anything like me, you love discovering new, better, easier ways to do things. And every once in a while, that new, better, easier way to do things is so profoundly simple that we all stare and wonder, Why did I put up with the old way?

The Problem: Process Notification

This is review of PushMon, a new way I recently discovered for notification of automated computing processes.

Many of the systems we use in analytics, decision support, and business intelligence rely on automated computing processes for workflow, data movement, background processing, and so on. But how do I know if my process succeeded or failed?

Notification vs. Logging

Process notification is not the same as logging. Logging of the success or failure of an automated process should be a given – always do it. A log gives an essential historic record of process outcomes. But a log only informs someone who looks at it. Notification differs from logging in that it pushes information to a recipient.

Approaches to Process Notification

No Notification

The worst way to do this is to have no process notification and hope someone notices when the process fails. Surely no one would be that stupid, right? Well, uh, actually… I made that mistake once, and let me tell you, it wasn’t fun when it blew up in my face.

On-Success Notification

Adding on-success notification at the end of the process is a step in the right direction. As the name indicates, on-success notification occurs only when the process succeeds and informs the recipient that the process succeeded.

On-success notification relies on a couple of assumed paths of action on the part of the recipient. The first is that the recipient will take positive action on the notification. This is the case, for example, when a data steward must review data before it is released to users. But unless the recipient is paying attention to the notification and acting on it, then the notification is nothing more than spam.

The alternative action path assumes that the recipient will notice if they don’t get the notification. In this case, it’s a passive, unreliable form of failure notification, relying on the recipient to notice if the notification doesn’t arrive.

On-success process notification is easy to implement. You simply add a notification step at the end of your process. If the rest of the process succeeds, it runs the notification last. There is the risk that the process could succeed but the notification fail to be transmitted, but that can be largely mitigated with standard message handling techniques.

On-Failure Notification

On-failure process notification is a much better approach than relying on a user to notice that the on-success notification didn’t arrive.

But how do you make the on-failure notification solution bulletproof? If you build failure checking into your process, that’s fine as long as (1) the process that initiates the failure checking doesn’t fail and (2) the process that does the notification doesn’t fail. So ideally, the notify-on-failure solution should have 3 key characteristics:

  1. The notification process should be disconnected as much as possible from the original automation process – different process, different software, different server, and ideally not on the same network.
  2. The solution should provide notifications when an explicit failure is trapped.
  3. The absence of a success notification should also trigger a notification

In contrast to the simplicity of on-success notification, on-failure notification requires some complexity and forethought that makes it less than idiot-proof. And realistically, knowing that a process failed is usually more critical than knowing that it succeeded.

A Solution: PushMon

Enter PushMon. The new, better, easier, profoundly simple solution.

I’ve started using the PushMon online service to quickly, easily create process notification that meets our criteria for reliable on-failure notification.

PushMon provides a URL for each process you want to monitor. (Yes, PushMon only works for processes that have Internet access – so that’s one constraint to consider.) Let me walk you quickly through how it works. (This is my take based on my experience – check the instructions on the PushMon web site.)

Once you have created your PushMon account, you just go to the Create URL screen and set up three things:

  1. The “Alert me by” method(s) for notification, including email, phone call, SMS, Twitter, URL, Google Talk, and Yahoo Messenger. You can mix and match these as needed.
  2. The “If I don’t ping the URL” interval. The following are supported:
    • every day
    • every weekday
    • every weekend
    • every endOfMonth
    • every hour
    • every 15 minutes
    • every 30 minutes
    • every 6 hours
    • every 12 hours
    • every Mon-Sun
    • every 1st-31st
    • Time of day options for daily schedules
  3. The “Give this URL the name” text entry that simply provides a label used to identify the process URL within PushMon.
PushMon process notification

PushMon Create URL screen for process notification.

Once you submit the URL settings, PushMon gives you a URL. In the example I created for this review, the URL is

Pushmon process notification URL

Pushmon process notification URL

Pushmon also provides sample code showing your custom URL in various programming languages. To date, I have only used Curl because I prefer to isolate my process code from the execution and notification. So I just put this call (with the right URL for that process) at the end of my automation script.

curl -L "${}"

So now if PushMon doesn’t get a success notification every day (in this case), it will alert me using the notification methods I set up for this URL.

Alert On Demand

PushMon also supports on-demand alerts with custom error messages.

curl -L "${,+we+have+a+problem.}"


PushMon also gives a recent history chart, showing the last 10 successful pings and the last 10 alerts sent.

Bottom Line

PushMon is not the only way to solve the notification problem, but it’s in running to be the best.

There are sophisticated enterprise workflow tools that have a lot more features. But PushMon wins hands down on ease of use. It takes about a minute to set up a URL, a couple minutes to write (or copy/paste) the code into your process, and you are done.

And (at least while it’s in free beta) PushMon’s price can’t be beat.

Obviously I’m a fan. What do you think?

21: Email Analytics

Review of Text Analytics Concepts

This episode walks through building a simple email analytics program. To get the conceptual foundation, check out the previous installment: Episode 20: Unstructured Data & Text Analytics

Unstructured Content with Metadata

Much of what is called “unstructured data” fits this category.

e.g. A Twitter post has

  • Unstructured content
    • The text of the tweet
  • Metadata
    • User account
    • Date/Time
    • Client app/device
    • IP Address
    • Location (maybe)

Targeted content within indeterminate structure

The known structure is defined outside the content:

e.g. The Acme Widget Corp is monitoring Twitter looking for tweets that contain:

  • Any reference to Acme Widget Corp (in various forms)
  • Tags (@ or #) that are relevant
  • Positive or negative sentiment
  • Actionable feedback

Email Analytics: Goals of the Excercise

  • Conceptual
    • Provide a simplified end-to-end model of unstructured data visualization
  • Functional
    • Provide an example of email analytics that includes both the metadata and targeted analysis of the unstructured content (but not advanced NLP).
  • Technical
    • Avro: serialization of emails
    • Pig: crunching the data
    • D3JS: visualization in a browser with JavaScript

With nothing but a terminal window…

[Check out this Dilbert cartoon:}


  • I want to know who is using the most “Big Data Buzzwords”, using my own email account.
  • This requires that I perform email analytics using both the metadata and content of the emails.
  • The actual analysis performed is very simple, but the email analytics techniques create a foundation I could use for more complex analysis by layering on other techniques.


  • hadoop
  • big data
  • analytics
  • cloud
  • predictive
  • data science
  • data scientist
  • nosql
  • unstructured data
  • data visualization
  • data discovery

I started with the book, Agile Data Science by Russell Jurney. But while it did point me in some directions by sparking ideas, it wasn’t helpful:

  • I couldn’t get many of the examples to work
  • The book focuses on analysing email metadata, but nothing really useful on email analytics that includes the unstructured email body
  • But there are plenty of online resources, so I took a couple ideas from the book, some of my own ideas, and some web resources to do email analytics including both the metadata and the content.

Steps of the Process

Data Prep

  • Capture emails & clean the data with Python
    • Stream the emails from the source
    • Parse the metadata and content
    • Pass the parsed content to Avro
  • Serialize the emails with Avro to a consistent schema
    • Serialization is the translation of content into a storage format that is semantically identical to the original – allowing the original to be reconstructed from the translation
    • Apache Avro is an open source data serialization application maintained by the Apache Software Foundation.


Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.”

  • I used Apache Pig to group the emails and process the buzzword scores
  • A Pig UDF (in Python) returns the actual count
  • Store the grouped scores in JSON format (sort of)


  • I created some simple html pages using the D3.JS javascript library to generate SVG. Obviously, given my background in Business Intelligence, I don’t recommend hand-coding data visualization in web pages, but for this exercise it was consistent with the stated goals and approach.
Email Analytics: Buzzword Frequency

Email Analytics: Buzzword Frequency

Email Analytics: King of the Buzzwords

Email Analytics: King of the Buzzwords

Tip: Python provides an instant HTTP Web Server:

python -m SimpleHTTPServer [optional port number]

This is a great way to run web pages for development and local use with no setup or configuration required. Listen to the audio webcast for more explanation of how to use it.

Key Take-aways

Apache Pig is very cool

It needs to mature, but conceptually it’s the best of SQL and procedural scripting.

Programmers always have to “sweat the small stuff,” like…

  • Data parsing & cleanup
    • Avro solves the easy part of serializing emails… But I still had to hand-roll the parsing. [I used Python]
  • Character set inconsistency
    • Python and Pig both handle UTF-8, but when my Pig script calls a Python UDF, it chokes on passing non-ASCII characters in the parameter value. [I had to remove them in the data cleanup.]
  • Data format inconsistency
    • Pig and D3.JS don’t handle JSON and CSV files the same way

Get the code

The Github repository includes the code for my email analytics example, but it does not include the content, because I used a set of emails from my personal mailbox, and I didn’t get the permission of all the senders to use their private communication.

20: Unstructured Data & Text Analytics

Some friends and colleagues of mine formed a Big Data Bootcamp to share knowledge and learn together about some of the current and emerging technologies in big data and analytics. They asked me to present 2 sessions on analytics for text and unstructured data. This episode off the podcast is based on the first of those Big Data Bootcamp sessions, in which I talked about a mental model for text analytics and unstructured data, and then present some examples to illustrate the concepts.

If you haven’t already done so, be sure to subscribe for email updates or to either the RSS or iTunes podcast feeds so you don’t miss the next installment when I get into a hands-on example of how to do analytics for text and unstructured data.
Northwood Advisors
This podcast is sponsored by Northwood Advisors, experts in improving performance with data-driven decision-making.

These show notes provide an outline to follow the audio content, as well as some links and enhanced content.

A Mental Model for Text Analytics and Unstructured Data

Factors to Consider for Unstructured Data

1. Degree of Inherent Structure

  A. Indeterminate Structure

Indeterminate structure is the least structured of all – think of the SETI (Search for Extraterrestrial Intelligence) project searching for any non-terrestrial, non-random radio waves. To do analysis of indeterminate structure, you have to look for patterns and then identity and/or count them.

Example: Trending topics on Twitter.

B. Unstructured Content with Metadata

Much of what we handle as unstructured data really has a metadata envelope around indeterminate structure: an email, a Tweet, etc.

Example: A Twitter post has:

  • Indeterminate content: the text of the tweet
  • Metadata: User account, Date/Time, Client app/device, IP Address, Location (maybe), etc.

C. Targeted Content withing Indeterminate Structure

Typically when we deal with indeterminate structure, we are looking for targeted content. In these cases, the known structure is defined outside the content.

Example: The Acme Widget Corp is monitoring Twitter looking for tweets that contain:

  • Any reference to Acme Widget Corp (in various forms)
  • Tags (@ or #) that are relevant
  • Positive or negative sentiment
  • Actionable feedback

D. Semi-structured Content

  •  Structure may vary
  • Mix of structured and unstructured content


  • Survey data
  • XML files or other markup languages that provide identifiable structure

2. Processing Considerations

  • Data volume
  • Data sources – number, type, variety, consistency, etc.
  • Static vs. Dynamic
  • Latency requirements

3. Signal vs. Noise

We have to think about signal vs noise at both ends of the pipe:


How we handle signal/noise is tied to the degree of inherent structure and the processing method.


What questions are we trying to answer? What’s the minimum amount of ink/pixels that will answer those questions?

Recommended Reading

The Visual Display of Quantitative Information
by Edward R. Tufte

Basic Approaches

The methods of visualizing unstructured data fall into a few basic categories:

1. Density

  • Word clouds
  • Points on a map

2. Classification

  • Graphing based on dimensionality
  • Sentiment analysis

3. Association

  • Networks/Connections

4. Change/flow

  • Charts
  • Dynamic visualizations


Google Fusion Tables

UW Interactive Data Lab

Stanford imMens

Stanford Dissertation Browser

Other Resources

Analytics for Marketing

I had the opportunity to present on Marketing Analytics at the Alteryx SoCal User Group. Download my presentation slides here and see a summary of some of the content below.

I use the metaphor of a table to help us communicate the four essential elements of analytics and data-driven business outcomes – the four legs of the table. The ideas are borrowed from the Northwood Advisors site.

1. Aligning analytics with strategy

4 legs of analytics

You know what you want from your business, and you need analytics to help you drive results from your strategy. If your decision making processes are not aligned with strategy, then your team could be pushing really hard in the wrong direction. Focus on aligning decision making processes with your strategy to assure that everyone is pushing the direction you want them to.

2. Data

The data to support marketing analytics should embrace several categories of data sources:

  • Internal performance data about products/services, employees/teams, customers, etc.
  • Digital media metrics, including social media, web, etc.
  • External market and consumer data

One key task of analytics is to blend these disparate data sources and types of data in order to provide a coherent picture.

3. Expertise

There is a myth held by some executives that analytics just involves pumping a bunch of data through the really smart software and getting brilliant answers. They couldn’t be more wrong. Analytics requires expertise across some key disciplines, including but not limited to these:

  • Domain expertise about the operations of your business,
  • Knowledge of the schema and definitions of the data,
  • Statistical modeling expertise to know which software and techniques to employ, and
  • Organizational insight to identify the right people and situations to drive value from analytics, as opposed to “shelfware” analytics.

4. Tools

In Episode 18: BI Trends for 2013 and Beyond I describe the landscape of broad BI platforms and niche tools of analytics. Those observations still hold true, and I recommend that podcast. Here are some of the categories to consider.

  • Advanced statistical analysis and predictive modeling for targeted solutions
  • Geospatial and market analytics
  • Data-driven process optimization for operations and support functions
  • Self-service reporting and analysis to empower decision makers
  • Dashboards and scorecards for actionable, metrics-driven management
  • KPIs (Key Performance Indicators) to focus resources on strategic objectives
  • Driver-based modeling for forecasting and planning
  • Enterprise business intelligence platforms

19: Big Data beyond the hype

The Big Data hype cycle is in full swing. But what is Big Data? How do you know if your data is BIG?

Big Data is not a concretely definable category. You can’t always say exactly what it is, but you know it when you see it. In this episode I define the key characteristics of Big Data that enable us to make more intelligent assessments and decisions regarding Big Data solutions.

Key characteristics of Big Data.

  • Physical Attributes

    • Bigness: physical size of data sets

    • Multi-source: data from multiple sources, especially both internal and external to the organization

    • Multi-structure: tabular data, markup data, audio and video data, geospatial, activity, transactions, snapshots, statuses

    • Fast arriving: streaming, frequently updated, time volatile

  • How we process it

    • Real time analysis

    • Real time outputs

      • Delivery to decision makers in real time

      • Delivery to external users (consumers, social/mobile users)

      • Interaction with software APIs

    • Aggregate and details

  • What we do with it

    • Predictive value

    • Pattern recognition, especially unlikely relationships

      • fuzzy matching

      • flexible matching

  • Challenges

    • Storage

    • Processing

    • Integration

    • Analysis

Thanks for listening to the Real Time Decisions Webcast – the leading ongoing Business Intelligence podcast focused on practical solutions.

Check out our sponsor, Northwood Advisors:

18: BI Trends

In this episode I talk about “BI Trends for 2013 and beyond.”
Check out the video on the Northwood Advisors web site.
Here is the link to my article: Rising to the Challenges in BI Healthcare
The 4 trends – listen to the audio to get all the details:
1. The maturing of broad BI platforms
For most companies, they can meet most of the needs of most of the people most of the time.
2. Niche tools fill the gaps
Analysts benefit from powerful tools from best-of-breed niche tools across a variety of functions.
3. Mobile BI
Putting usable content in the hands of users in the places it’s most needed is a reality that is here today and won’t be going away.
4. Big data pushing the envelope
Despite the confusing hype, the need is real and there is a growing array of tools to solve the problem.
Thanks for listening to the BI Podcast that focuses on creating a better world through better decisions.

17: Why BI Matters

Previous episodes of the Real Time Decisions Webcast have covered a lot of material about What is BI and How to do BI, but haven’t dug too deeply into Why BI Matters. In this episode, Myron Weber explores this important topic, providing information and inspiration for your company’s BI efforts.
Check out my company web site at
Also, as recommended in the podcast, check out my business coach, Dave Luke at
Thanks for listening to the BI Podcast that strives to change the world with better decisions.

16: Real World BI Requirements

In this episode of the webcast, I discuss Real World BI Requirements.
BI Requirements in the real world must avoid two common unconstructive extremes and provide a constructive alternative.
  • The first extreme is a data-driven approach that fails to account for the objectives and outputs required.
  • The other extreme is a blank-page approach.
The audio podcast explores these concepts and provides a constructive alternative – check it out and share your thoughts in the comments.
Don’t forget to check out my company at to learn about our advisory services for BI Strategy, Roadmap, Governance, and Best Practices.