2  ELT

Consider the following examples to establish a data pipeline.
1. A zip file lands in data lake (s3/minio) daily.
2. Execute scripts in the server (ec2) to download/unzip/select/upload files based on mtime. It produces a file (csv) to track work done at the agreed cut-off time (cron). AWS lambda is another option.
3. snowflake external stage (s3) is triggered by a file (txt) to kicks off snowpipe and ingest data to DB as variant. Similar storage are databrick, dremio, clickhouse. The preferred formats are parquet, iceberg, ADBC.
4. Move data between platforms via airbyte.
5. Validate and transform DB raw to DB mart through dbt.
6. Automatize the process by a task scheduler prefect, airflow, dagster.
7. Create a dashboard for DB mart via metabase, superset.

DuckDB cloud

DuckDB terminal