Fork me on GitHub

Workflows

Overview

Most atomic manually manageable unit of processing is a job. Job is divided by framework to multiple tasks - each service gets one task which corresponds to one object. One task request is assigned to one service and one object (called a current object). Service is free to disregard the current object and work on any other objects, or even all objects in that job. Object is just a collection of attributes - much like the JSON document. Services can append attributes to the existing object or create new objects. The flow of all objects (and, by extension, flow of tasks in a particular job) is defined by a workflow file. Workflow file is basically an XML document interpreted by the framework.

HSN Workflow Language

Each workflow is written in the HSN Workflow Language (HWL). Let’s take a simple workflow document and describe its components.

<workflow>
        <description>
                Very simple workflow.
        </description>
   <process id="main">
      <service name="feeder-list" id="feeder0">
         <parameter name="uri">file.txt</parameter>
         <parameter name="domain_info">true</parameter>
         <output process="process_url" />
      </service>
   </process>
   <process id="process_url">
      <service name="webclient" id="webclient0">
         <parameter name="link_click_policy">0</parameter>
         <output process="report" expr='type=="url"' />
      </service>
      <service name="reporter">
         <parameter name="serviceName">webclient</parameter>
         <parameter name="template">webclient.jsont</parameter>
      </service>
      <service name="js-sta" id="javascript0" />
      <service name="reporter" id="reporter2" ignore_errors="INPUT">
         <parameter name="serviceName">js-sta</parameter>
         <parameter name="template">js-sta.jsont</parameter>
      </service>
      <service name="reporter">
         <parameter name="serviceName" />
         <parameter name="template">url.jsont</parameter>
      </service>
   </process>

   <process id="report">
      <service name="reporter">
         <parameter name="serviceName">webclient</parameter>
         <parameter name="template">webclient.jsont</parameter>
      </service>
      <service name="js-sta"/>
      <service name="reporter" id="reporter2"  ignore_errors="INPUT">
         <parameter name="serviceName">js-sta</parameter>
         <parameter name="template">js-sta.jsont</parameter>
      </service>
      <service name="reporter">
         <parameter name="serviceName" />
         <parameter name="template">url.jsont</parameter>
      </service>
   </process>
</workflow>

Let’s have a look at the tags in the document above, in an order in which they’ve appeared.

description

        <description>
                Very simple workflow.
        </description>

Tag description is responsible for the textual description of the workflow. This will only be used when displaying workflow status.

process

   <process id="main">
      <service name="feeder-list" id="feeder0">
         <parameter name="uri">file.txt</parameter>
         <parameter name="domain_info">true</parameter>
         <output process="process_url" />
      </service>
   </process>

Tag process is used to describe one of the possible object flows. Workflow can use it to direct output objects to a specific flow. Services that produce output objects can provide this object to any process that you specify. In this way you can control flow of the objects produced by the specific service, as you will see later on. Each process has an id which will be used to refer to it in the workflow. Each process is usually composed of service tags.

service

      <service name="reporter" id="reporter2" ignore_errors="INPUT">
         <parameter name="serviceName">js-sta</parameter>
         <parameter name="template">js-sta.jsont</parameter>
      </service>

Tag service is used to direct the current object to the specific service. This service can use the current object in processing and extend it with new attributes or create new objects. Services usually work sequentially i.e. attributes created by one service are used by the other one. In this example webclient produces a JavaScript contexts, which are later analyzed by js_sta service. Service has two attributes: id and name. Tag name corresponds to the service type. Tag id is used to refer to that service from command line and is not required. There is also ignore_errors tag, which can be used to ignore specific class of errors. As you can see we use it on reporter2 - we will ignore all input errors, because we may assume that the template provided to the reporter is invalid in some cases.

In the content of service tag you can have additional tags: parameter and/or output.

service/parameter

         <parameter name="serviceName">js-sta</parameter>

This tag is used to declare a parameter to the service. Parameters define how the current object will be processed by the service. Parameter is just a pair of key (name attribute) and value (content of the parameter tag). Parameters for each service are described in the different section.

service/output

         <output process="process_url" />

This tag redirects new object to the specific process with an id equal to the value of process attribute. Additionally, you can specify OGNL expression (in the expr attribute) that the new object must meet in order to be redirected to that process. Multiple output tags are supported. If service does not produce any new objects, output tag is simply ignored. All object created by the serviced are immediately redirected to the process specified in this tag. If you want to report them, please remember to put a reported in the child process (as in the example).