Skip to content

Overview

Pipeline Hosting

For guides on how the analytics and pipelines written in Corridor can be deployed to Production - refer to the Direct to Production guide.

Guides that cover the installation, configuration, and scaling of Self-Hosted Corridor instances for analytical use.

Architectural Overview

The Analytical layer of Corridor for analysts to test and validate their logic and get the required approvals and compliance checks. The production layer is NOT described here as Corridor is isolated from the Production side.

Corridor is divided into various components to keep it modular and enable easy scaling for cloud-based deployments and also to manage high loads without much change. Each of the components can be installed on separate machines or any subset can be installed in the same machine.

The components are divided into:

  • Web Application Server: The web application server for the analytical UI of the platform
  • API Server: The API for business logic
  • API - Celery worker: The worker for asynchronous API tasks
  • Spark - Celery worker: The worker for asynchronous spark tasks
  • Jupyter Notebook: The Jupyter Notebook server for free-form analytical use
  • File Management: The file management server to manage files
  • Metadata Database (SQL RDBMS): The database with all metadata provided in the Web Application
  • Messaging Queue (Redis): The messaging queue to orchestrate worker tasks
  • Authentication Provider: The identity and auth provider for access and permissions
  • Proxy / Load Balancers: Load Balancers / Proxies to simplify the install

Here is a typical network diagram of how the installation would look like: Network Diagram