Catalyst

Catalyst

Objective:

Stop reinventing wheel for bigdata development between different teams within the organization
Generate a software library/framework providing abstractions and tools to ease their dev lifecyle

Approach:

Consulted different teams to know their needs
Gathered a list of - tools (like CICD pipelines, build files, Nexus,..)
- abstractions (like IO utils, config parsers,..)
Minimal software design leaving out scope for more features to be added
Adoption of most common used design patterns - functional and object-oriented basing on the problem statement

Results:

Small portable framwork to incorporate in any bigdata project
Supports Apache Spark, Apache Hadoop, Apache Kafka, Apache Hbase, MongoDB, … and other bigdata technologies
Adds support of automation with scripting and devops tools like docker, jenkins, maven/gradle/sbt,…