# Airbyte General
## Introduction
[Airbyte](https://www.airbyte.com) is an open-source data integration platform that simplifies the process of extracting, loading, and transforming (ELT) data across various systems. Launched in 2020, Airbyte provides pre-built connectors to sync data from a wide range of sources to destinations, making data movement more accessible and scalable for businesses.
<br>
![[Tooling Airbyte.png]]
*Extraction setup with Airbyte syncing raw source data to the analytics database*
## Features
Airbyte offers a robust set of capabilities that streamline data integration and pipeline management:
- **Pre-Built Connectors:** Supports 300+ data sources and destinations, including databases, APIs, and cloud services.
- **Custom Connector Development:** Allows users to build and deploy custom connectors with ease.
- **ELT-Based Architecture:** Extracts and loads data before applying transformations, optimizing performance.
- **Incremental Data Syncs:** Reduces processing time by syncing only new or updated records.
- **Scheduler & Monitoring:** Automates syncs and provides monitoring tools to track pipeline performance.
- **Data Transformation:** Supports integration with dbt (data build tool) for in-warehouse transformations.
- **Self-Hosted & Cloud Deployment:** Offers flexibility to run Airbyte on-premise or via Airbyte Cloud.
- **Role-Based Access Control (RBAC):** Ensures secure data handling with granular permission settings.
## Applications
Airbyte is widely used across industries and data teams for various data movement and integration tasks:
- **Data Warehousing:** Consolidating data from multiple sources into a centralized repository like Snowflake, BigQuery, or Redshift.
- **ETL/ELT Pipelines:** Automating data movement workflows while ensuring data consistency.
- **Analytics & Business Intelligence:** Enabling teams to integrate data for dashboards, reporting, and insights.
- **Machine Learning & AI:** Providing structured datasets for model training and experimentation.
- **Cloud Data Migration:** Facilitating the seamless migration of data between cloud platforms.
- **Compliance & Auditing:** Ensuring accurate and auditable data syncs for regulatory compliance.
## Best Practices
To make the most of Airbyte’s capabilities, follow these best practices:
- **Optimize Sync Schedules:**
- Schedule syncs based on data freshness requirements to avoid unnecessary resource usage.
- **Use Incremental Syncs When Possible:**
- Reduce processing overhead by syncing only new or modified records.
- **Monitor Pipeline Performance:**
- Set up alerts and logs to detect failures or anomalies in data syncs.
- **Leverage dbt for Transformations:**
- Keep transformations modular and manageable by integrating dbt workflows.
- **Secure Data Transfers:**
- Implement access controls and encryption to protect sensitive data.
- **Test & Validate Connectors:**
- Regularly validate data integrity and schema consistency when syncing between sources and destinations.
## Pricing
Airbyte offers flexible [pricing plans](https://airbyte.com/pricing) to accommodate various data integration needs:
1. **Open Source**: Free and self-hosted, ideal for practitioners seeking full control over their data pipelines without governance requirements.
2. **Cloud**: A fully managed service starting at $10 per month, which includes 4 credits. Additional credits are priced at $2.50 each. This plan is suitable for teams looking to automate ELT pipelines effortlessly.
3. **Team**: Capacity-based pricing, cloud-hosted by Airbyte, designed for organizations that require scalability, governance, and security while simplifying pipeline management.
4. **Enterprise**: Capacity-based pricing with self-hosting, catering to organizations needing enhanced security, compliance, and full infrastructure control.
## Usage
The Airbyte extractor is used in almost all our client's setups, some times as the core extractor and sometimes as complement to existing tooling. We use Airbyte for extracting data from the accounting systems like [[Exact Online]], the CRM system [[Hubspot]], [[Pipedrive]] and [[Tribe]], and Inhouse Software logging events like [[InfluxDB]].
# Deploying own Airbyte instance
### About
This document is a supplement how to install the latest version of Airbyte on your own Google Cloud Platform.
**Version**
For `abctl` v0.24.0
### Creating VM instance
#### Settings of the VM
- `e2-standard-2` (2 vCPU, 1 core, 8 GB memory)
## Deploy using abctl
### About
This document is a supplement how to install the latest version of Airbyte on your own Google Cloud Platform.
**Version**
For `abctl` v0.24.0
### Creating VM instance
#### Settings of the VM
- `e2-standard-2` (2 vCPU, 1 core, 8 GB memory)
- HDD 30 GB - 50 GB
https://docs.airbyte.com/deploying-airbyte/abctl-ec2
Mostly similar to Google Compute engine
### Docker
#### Docker official method
Docker suggest uninstalling unrelated docker distributions https://docs.docker.com/engine/install/ubuntu/ that may come with the OS distribution.
*For Google Compute Engine this was not the case*
##### Install using the `apt` repository
[Docs](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository)
```sh
# Add Docker's official GPG key:
sudo apt-get update && sudo apt-get upgrade
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "${UBUNTU_CODENAME:-$VERSION_CODENAME}") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
```
##### Install latest docker package
```sh
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
```
#### Docker from distro apt repository
The following steps are similar to the first 5 steps from [abctl-ec2](https://docs.airbyte.com/deploying-airbyte/abctl-ec2)
1. Install the docker engine (if it is not available, test if you can run the hello world example if so skip the first step)
```sh
sudo apt install docker -y
```
2. Add current user to dockergroup
```
sudo usermod -aG docker $USER
```
Log out using `exit` and log back in.
Verify docker has been added to groups by running `groups` docker should be listed
3. Start docker
```sh
sudo systemctl start docker
sudo systemctl enable docker
```
4. Download the latest version of abctl and install it in your path:
1. using Homebrew `brew tap airbytehq/tap brew install abctl`
2. Using curl
```sh
curl -LsfS https://get.airbyte.com | bash -
```
5. run
`abctl local install`
or with values
`abctl local install --values values.yaml`
### Troubleshooting deploy using abctl
```log
ERROR Failed to install airbyte/airbyte Helm Chart
ERROR Unable to install Airbyte locally
ERROR unable to install airbyte chart: unable to install helm: Kubernetes cluster unreachable: Get "https://127.0.0.1:34383/version": EOF
```
## Deploy using Kubernetes on a single VM using Kind
*This is a work in progress article*
### Installing Kubernetes on VM
In order to install Kubernetes on a single VM (Sufficient for current workloads) we use [kind](https://kind.sigs.k8s.io)
#### Connect with vm
### Update VM
```sh
sudo apt update && sudo apt upgrade -y
```
### Install kind requirements
#### Go
```sh
sudo apt install golang-go
```
**Set envvar GOPATH**
GOPATH is waarschijnlijk niet gezet
`echo 'export PATH=$PATH:$(go env GOPATH)/bin' >> ~/.bashrc && source ~/.bashrc`
#### Docker
```sh
sudo apt install docker.io -y
```
**Add current user to dockergroup**
```sh
sudo usermod -aG docker $USER
```
Log out using `exit` and log back in.
Verify docker has been added to groups by running `groups` docker should be listed
### Install Kind
```sh
go install sigs.k8s.io/
[email protected] && kind create cluster
```
### Install Helm
```sh
sudo snap install helm
```
### Install kubectl
Install kubectl from native package management
```sh
sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
sudo chmod 644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo chmod 644 /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl
```
### Airbyte on VM
### Add Helm Airbyte Chart repo and install Airbyte
See [Airbyte Helm Charts](https://airbytehq.github.io/helm-charts/)
```sh
helm repo add airbyte https://airbytehq.github.io/helm-charts
helm repo update
```
Create a namespace
`kubectl create namespace airbyte`
### Install Airbyte
```
helm install airbyte airbyte/airbyte --namespace airbyte
```
### Access Airbyte instance
When succesfull installation is completed you'll see the following message
```sh
Get the application URL by running these commands:
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl -n default port-forward deployment/airbyte-webapp 8080:8080
```
## Values.yaml configuration
When installing using the `abctl` a values.yaml version can be supplied with configurations.
```sh
abctl local install \
--values ./values.yaml
```
Example of a values
```yaml
# values.yaml
global:
auth:
cookieSecureSetting: "false"
env_vars:
JOB_MAIN_CONTAINER_CPU_REQUEST: "0.5"
JOB_MAIN_CONTAINER_CPU_LIMIT: "1.5"
JOB_MAIN_CONTAINER_MEMORY_REQUEST: "0.5Gi"
JOB_MAIN_CONTAINER_MEMORY_LIMIT: "3Gi"
```