Catalyst

Objective:

  • Stop reinventing wheel for bigdata development between different teams within the organization
  • Generate a software library/framework providing abstractions and tools to ease their dev lifecyle

Approach:

  • Consulted different teams to know their needs
  • Gathered a list of - tools (like CICD pipelines, build files, Nexus,..)
    • abstractions (like IO utils, config parsers,..)
  • Minimal software design leaving out scope for more features to be added
  • Adoption of most common used design patterns - functional and object-oriented basing on the problem statement

Results:

  • Small portable framwork to incorporate in any bigdata project
  • Supports Apache Spark, Apache Hadoop, Apache Kafka, Apache Hbase, MongoDB, … and other bigdata technologies
  • Adds support of automation with scripting and devops tools like docker, jenkins, maven/gradle/sbt,…