No code self-runnable micro ETL

March 5, 2023

ETL: Extract, Transform, Load

This code can extract data from a given source, transform it however you need, and load the result to a given destination. It's designed to be easy to use, with everything configured through a single configuration file.

It's meant to be used as a stand-alone App which is triggered and run by a CI/CD. The main difference with other existing ETL is the choice to go for ease of use before anything else.

You don't need to code anything for most basic use cases, and it can be extended for more specific sources, destinations, formats, and workflows.

Note: For that reason, it has inherent limitations, for example, it can only handle one data flow per config file (even if you may pass through multiples transformers).

Take a look at this example config:

extract:
  pull:
    type: file
    uri: ./demo/data_in.csv
  read:
    format: csv
    options:
      trim: true
      with_header: ["Name", "Sex", "Age", "Height", "Weight"]

transform:
  filter:
    type: query
    options:
      where: 'Age > :min'
      parameters:
        min: 30
  mapping:
    type: expressive
    map:
      out.name: in.Name
      out.sex: in.Sex
      out.age_in_sec: 'in.Age * 365 * 24 * 60 * 60'

load:
  write:
    format: json
  push:
    type: http
    uri: https://webhook.site/f24c112b-8344-4fe3-a9e5-53baf36c912f
    options:
      headers:
        - 'Authorization: Basic 9e222b3b7647c7'

I bet you can understand what's going on only by reading the file!

The code gets a CSV file from the local repository, transforms the data, and sends it back to an API as JSON. It's easy to configure, with everything described in one single file.

The project has all the needed documentation and examples for each ETL component.

ETL Runner being linked to CI/CD, the other important part is how to trigger runs and commit back to the repository. For these points, the project comes with specific documentation for major code hosting providers (GitLab and GitHub for now)

You can find ETL-Runner on GitHub and GitLab (just fork the repository and make it your own).

Note: Keep in mind that this app isn't designed for big volume or intense data processing. If you plan to trigger the ETL more than twice a day, you may need to look for something more powerful. And depending on your usage, it could even go against some provider's terms of service.