No code self-runnable micro ETL
March 5, 2023
ETL: Extract, Transform, Load
This code can extract data from a given source, transform it however you need, and load the result to a given destination. It's designed to be easy to use, with everything configured through a single configuration file.
It's meant to be used as a stand-alone App which is triggered and run by a CI/CD. The main difference with other existing ETL is the choice to go for ease of use before anything else.
You don't need to code anything for most basic use cases, and it can be extended for more specific sources, destinations, formats, and workflows.
Note: For that reason, it has inherent limitations, for example, it can only handle one data flow per config file (even if you may pass through multiples transformers).
Take a look at this example config:
extract:
pull:
type: file
uri: ./demo/data_in.csv
read:
format: csv
options:
trim: true
with_header: ["Name", "Sex", "Age", "Height", "Weight"]
transform:
filter:
type: query
options:
where: 'Age > :min'
parameters:
min: 30
mapping:
type: expressive
map:
out.name: in.Name
out.sex: in.Sex
out.age_in_sec: 'in.Age * 365 * 24 * 60 * 60'
load:
write:
format: json
push:
type: http
uri: https://webhook.site/f24c112b-8344-4fe3-a9e5-53baf36c912f
options:
headers:
- 'Authorization: Basic 9e222b3b7647c7'
I bet you can understand what's going on only by reading the file!
The code gets a CSV file from the local repository, transforms the data, and sends it back to an API as JSON. It's easy to configure, with everything described in one single file.
The project has all the needed documentation and examples for each ETL component.
ETL Runner being linked to CI/CD, the other important part is how to trigger runs and commit back to the repository. For these points, the project comes with specific documentation for major code hosting providers (GitLab and GitHub for now)
You can find ETL-Runner on GitHub and GitLab (just fork the repository and make it your own).
Note: Keep in mind that this app isn't designed for big volume or intense data processing. If you plan to trigger the ETL more than twice a day, you may need to look for something more powerful. And depending on your usage, it could even go against some provider's terms of service.