Writing your own project

Tutorial: Data transformation

Initial:

Writing components

Writing graphs

Testing

Finishing up

See the full project


Idea

Make a project that calculates how Canadian a string is. As an added bonus, it will determine the emotion of the string based on how eh’s are used.


Requirements

Use:

  • our own components
  • other peoples components (other noflo libraries components)
  • can use fbp
  • graphs
  • subgraphs
  • tests (use fbp-spec and mocha tests to show both options)
  • debugging

Planning

  • To determine how Canadian something is, we want to check words inside of the string.
    • ~ If it is easily possible, figure out how far words are from each other
    • ~ and their location inside of the string (ie, at the very end, beginning, near the end.)
    • Check spelling of words, Canadan vs elsewhere. Canadian spelling first, then UK, then American.
    • Check the emotion of the word eh using symbols and letter case (ie, eh, Eh, EH, EH?, eh?!)
  • The output should have the Emotion, and the Canadian Score.

Researching

Word Weight

Search Google for a library that may be able to help us with dealing with word weights, singularize, and pluralize words. js library determine weight of words NaturalNode looks promising

Spelling

Search Google for the spelling differences Canadian vs uk vs usa spelling

perfect Canadian, British and American Spelling This is data on a table though with no apparent api, so we should get it into usable data (json). get table as json table to json table to json jquery


Pseudo code

TODO: move into overall arch or project definition

[How Canadian]
  INPUT=CONTENT(string)
  INPUT=WORDS(array) # words to use as weights
  INPUT=SPELLING(array)
  OUTPUT=EMOTION(string)
  OUTPUT=SCORE(number)

  # if we wanted to swap out emotion to calculate emoji insteadof _eh_ for example,
  # we can easily just replace this box. the same goes for any noflo box.
  [Emotion]
    INPUT=CONTENT(string)
    OUTPUT=EMOTION(string)
    [Find Ehs]
      # if it has no _eh_, emotion is flat

      # could also separate to collect and doing each and then sending another stream
      # and putting those back together as a score and using add

      # collects stream, determines emotion of each
      [Determine Emotion]

  [WordScore]
    OUTPUT=SCORE(number)
    INPUT=LIST(array) # control port, because we want to use one for each input
    INPUT=CONTENT(STRING)

  # since these both calculate score, one positive, one negative,
  # they can be separate instances of the same component
  [CanadianScore] # LIST would use WORDS
  [SpellingScore] # LIST would use SPELLING

    # output of CanadianScore & SpellingScore should be added together to get result
    # score from here is the SCORE
    [noflo/Math/Add]

Overall architecture

Canadianness

Real graph implementation of pseudo code:

graph = "
# (string)
INPORT=SplitContent.IN:CONTENT
# (array)
INPORT=SpellingScore.LIST:SPELLING
# (array)
INPORT=CanadianScore.LIST:WORDS

# (string)
OUTPORT=Emotion.EMOTION:EMOTION
# (int)
OUTPORT=Add.SUM:SCORE

# [core/Split](https://github.com/noflo/noflo-core/blob/master/components/Split.coffee) takes the input
# and sends it to each of the [sockets](http://noflojs.org/api/InternalSocket/) attached to the outport
# ----
# Emotion is subgraph
SplitContent(core/Split) OUT -> CONTENT Emotion(canadianness/Emotion)
SplitContent OUT -> CONTENT SpellingScore(canadianness/WordScore)
SplitContent OUT -> CONTENT CanadianScore(canadianness/WordScore)

SpellingScore SCORE -> ADDEND Add(math/Add)
CanadianScore SCORE -> AUGEND Add
"

See the graph noflo canadianness graph

Emotion

This can be its own graph loaded inside of the main graph as a subgraph so the whole operation can be represented as a box:

# (string)
INPORT=FINDEHS.CONTENT:CONTENT
# (string)
OUTPORT=DetermineEmotion.EMOTION:EMOTION

FindEhs(FindEhs) MATCHES -> CONTENT DetermineEmotion(DetermineEmotion)

See the graph noflo canadianness emotion graph

Writing tests

First in line for testing, we have fbp-spec

Just add a fbpspec.coffee file in /spec directory

Important to note, you cannot send brackets or do any sort of special operations using fbp-spec. To get around that, you will have to write components exclusively for testing, and fbp graphs as fixtures.

The command we use for noflo, and the flags can be found at noflo-nodejs flags

fbpspec = require 'fbp-spec'

nodeRuntime =
  label: "NoFlo node.js"
  description: ""
  type: "noflo"
  protocol: "websocket"
  secret: 'notasecret'
  address: "ws://localhost:3333"
  id: "7807f4d8-63e0-4a89-a577-2770c14f8106"
  command: './node_modules/.bin/noflo-nodejs --verbose --catch-exceptions=false --secret notasecret --port=3333 --host=localhost --register=false --capture-output=true --debug=true'

fbpspec.mocha.run nodeRuntime, './spec',
  fixturetimeout: 20000
  starttimeout: 100000

Then, for each test, just add a yaml file in the /spec directory, each yaml file in /spec is loaded by the fbp-spec.

topic: "canadianness/FindWords"
name: "Find words fbpspec"
cases:
-
  name: 'content eh'
  assertion: 'should be find one `eh`'
  inputs:
    word: 'eh'
    surrounding: false
    content: 'eh'
  expect:
    matches:
      equals: 'eh'

Implement components

DetermineEmotion

Import libraries

noflo = require 'noflo'

Useful functions

Function to calculate most common value (the mode)

findMode = (array) ->
  frequency = {}
  maxFrequency = 0
  result = undefined
  for v of array
    frequency[array[v]] = (frequency[array[v]] or 0) + 1
    if frequency[array[v]] > maxFrequency
      maxFrequency = frequency[array[v]]
      result = array[v]
  result

Component declaration

Define the input and output ports, and describe their function

exports.getComponent = ->
  c = new noflo.Component
    description: 'Find all of the instances of `word` in `content` and send them out in a stream'
    inPorts:
      content:
        datatype: 'string'
        description: 'the content which we look for the word in'
        required: true
    outPorts:
      emotion:
        datatype: 'string'
        description: 'the emotion based the content in ehs'
        required: true
      error:
        datatype: 'object'

Processing function

  c.process (input, output) ->

Receiving input

We expect a stream Will also accept a single (non-bracketed) input packet, returned as a stream of length 1

    return unless input.hasStream 'content'
    contents = input.getStream 'content'

The output will be a single packet (not a stream), hence we drop the openBracket and closeBracket

    contents = contents.filter (ip) -> ip.type is 'data'

extract the data payload from the IP objects

    contents = contents.map (ip) -> ip.data

Component business logic

First find which emotions are present, then calculate which one is most common. This could alternatively be split into two dedicate components.

to hold the emotions found

    matches = []

the emotions we will use

    emotions =
      joy: ['eh!']
      neutral: ['eh']
      amusement: ['eh?', 'Eh?', 'Eh??']
      fear: ['eH??', 'eh??']
      surprise: ['eh !?', 'EH!?']
      anticipation: ['eh?!']
      excitment: ['EH!', 'eH!']
      sadness: ['...eh', '...eh...', '..eh', 'eh..', '..eh..']
      anger: ['EH!?', 'EH?']

go through our content and our emotions then add them to our matches

    for content in contents
      for emotion, data of emotions
        if content in data
          matches.push emotion

if we didn’t get any emotions, it default to ‘neutral’

    if matches.length is 0
      mode = 'neutral'

if we did, we need to find the emotion that was the most common

    else
      mode = findMode matches

Send output

Also signals completion by using sendDone()

    output.sendDone emotion: mode

FindWords

Import libraries

noflo = require 'noflo'

Helper functions

Not NoFlo or even component-logic-specific, so nice to keep them separate

Return all RegExp matches on a string

matchAll = (string, regexp) ->
  matches = []
  string.replace regexp, ->
    arr = [].slice.call arguments, 0
    extras = arr.splice -2
    arr.index = extras[0]
    arr.input = extras[1]
    matches.push arr
    return
  if matches.length then matches else []

Extract the actual data of the match result

actualMatches = (matches) ->

because we want to send out an empty array if there are no matches

  return [[]] if matches.length is 0
  matches.map (match) -> match[0]

Component declaration

exports.getComponent = ->
  c = new noflo.Component
    description: 'Find all of the instances of `word` in `content` and send them out in a stream'
    inPorts:
      content:
        datatype: 'string'
        description: 'the content which we look for a word in'
        required: true
      word:
        datatype: 'string' # could be array|string, which would be `all`
        description: 'the word we are looking for instances of'
        control: true
        required: true
      surrounding: # could use a regex but this is a specific case
        datatype: 'boolean'
        description: 'whether to get surrounding characters, symbols before and after until space'
        default: false # if nothing is sent to it, this is the default when `get`ting from it
        control: true
    outPorts:
      matches:
        datatype: 'string'
        description: 'the resulting findings as a stream of data packets'
        required: true

Processing function

To preserve streams, forward brackets from the primary inport content to the output.

  c.forwardBrackets =
    content: 'matches'
  c.process (input, output) ->

Receiving input data

We need both a word, and content to start processing Since word is a control port, the latest value is kept, no need to continiously send

    return unless input.hasData 'word', 'content'
    [ word, content ] = input.getData 'word', 'content'

Component business logic

since we are sending out multiple data IPs we want to wrap them in brackets TODO: make exception safe

    output.send matches: new noflo.IP 'openBracket', content

do our word processing

    r = /([.?!]*eh[.?!]*)/gi
    matches = matchAll content, r
    matches = actualMatches matches

Sending output

for each of our matches, send them out

    for match in matches

if you just send content, it will automatically put it in a data ip so this is the same as output.send matches: new noflo.IP 'data', match

      output.send matches: match

this is the same as doing output.send and then output.done

    output.sendDone matches: new noflo.IP 'closeBracket', content

WordScore

Import libraries

noflo = require 'noflo'
natural = require 'natural'
tokenizer = new natural.WordTokenizer()

Component declaration

exports.getComponent = ->
  c = new noflo.Component
    description: 'Find how the input words compare against the list of weighted words'
    inPorts:
      list:
        datatype: 'array'
        description: 'list of words we will use with the list of content'
        control: true
        required: true
      content:
        datatype: 'string'
        description: 'the content which we will determine the score of'
        required: true
    outPorts:
      score:
        datatype: 'number'
        description: 'the resulting number of comparing the content with the list'
        required: true

Processing function

To preserve streams, forward brackets from the primary inport to the output.

  c.forwardBrackets = {}

  c.process (input, output) ->

Receive input

    return unless input.hasStream 'content'
    return unless input.hasData 'list'
    content = input.getStream('content').filter((ip) -> ip.type is 'data').map((ip) -> ip.data)
    list = input.getData 'list'

there can be multiple pieces of content

    content = content.join('\n')

Component business logic

our base score we will send out

    score = 0

splits content into an array of words

    tokens = tokenizer.tokenize content

if the list has the word in it, return the score otherwise, 0 points

    wordScore = (word) ->
      if list[word]?
        return list[word]
      else
        return 0

go through each of the comparisons in the list if it is Canadian: 1, American: -1, British: .5, None: 0

    spellingScore = (word) ->
      for comparison in list
        if word not in comparison["American"]
          if word in comparison["Canadian"]
            return 1
          else if word in comparison["British"]
            return 0.5
        else
          return -1

      return 0

if it has this, it is a spelling list

    if list[0]?["Canadian"]?
      scoringFunction = spellingScore

otherwise it is an object list of words with scores

    else
      scoringFunction = wordScore

use this to singularize and pluralize each word

    nounInflector = new natural.NounInflector()

go through each item in contents

    for data in tokens
      plural = nounInflector.pluralize data
      singular = nounInflector.singularize data

if it is already plural or singular do not use it

      if plural isnt data
        score += scoringFunction plural
      if singular isnt data
        score += scoringFunction singular

      score += scoringFunction data

Send output

    output.sendDone score: score

Providing a JavaScript API

TODO: write