As your information tasks develop, they usually contain multiple piece, like a database and a script. Working every thing by hand can get tedious and error-prone. One service wants to start out earlier than one other. A missed surroundings variable can break the entire movement.
Docker Compose makes this simpler. It permits you to outline your full setup in a single file and run every thing with a single command.
On this tutorial, you’ll construct a easy ETL (Extract, Remodel, Load) workflow utilizing Compose. It contains two providers:
- a PostgreSQL container that shops product information,
- and a Python container that hundreds and processes that information.
You’ll discover ways to outline multi-container apps, join providers, and check your full stack domestically, all with a single Compose command.
When you accomplished the earlier Docker tutorial, you’ll acknowledge some components of this setup, however you don’t want that tutorial to succeed right here.
What’s Docker Compose?
By default, Docker runs one container at a time utilizing docker run
instructions, which may get lengthy and repetitive. That works for fast exams, however as quickly as you want a number of providers, or simply need to keep away from copy/paste errors, it turns into fragile.
Docker Compose simplifies this by letting you outline your setup in a single file: docker-compose.yaml
. That file describes every service in your app, how they join, and learn how to configure them. As soon as that’s in place, Compose handles the remainder: it builds pictures, begins containers within the appropriate order, and connects every thing over a shared community, multi functional step.
Compose is simply as helpful for small setups, like a script and a database, with fewer possibilities for error.
To see how that works in observe, we’ll begin by launching a Postgres database with Compose. From there, we’ll add a second container that runs a Python script and connects to the database.
Run Postgres with Docker Compose (Single Service)
Say your staff is working with product information from a brand new vendor. You need to spin up a neighborhood PostgreSQL database so you can begin writing and testing your ETL logic earlier than deploying it elsewhere. On this early section, it’s frequent to start out with minimal information, generally even a single check row, simply to verify your pipeline works finish to finish earlier than wiring up actual information sources.
On this part, we’ll spin up a Postgres database utilizing Compose. This units up a neighborhood surroundings we are able to reuse as we construct out the remainder of the pipeline.
Earlier than including the Python ETL script, we’ll begin with simply the database service. This “single service” setup offers us a clear, remoted container that persists information utilizing a Docker quantity and might be related to utilizing both the terminal or a GUI.
Step 1: Create a mission folder
In your terminal, make a brand new folder for this mission and transfer into it:
mkdir compose-demo
cd compose-demo
You’ll maintain all of your Compose recordsdata and scripts right here.
Step 2: Write the Compose file
Contained in the folder, create a brand new file known as docker-compose.yaml
and add the next content material:
providers:
db:
picture: postgres:15
container_name: local_pg
surroundings:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: merchandise
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/information
volumes:
pgdata:
This defines a service named db
that runs the official postgres:15
picture, units some surroundings variables, exposes port 5432, and makes use of a named quantity for persistent storage.
Tip: If you have already got PostgreSQL operating domestically, port 5432
is likely to be in use. You’ll be able to keep away from conflicts by altering the host port. For instance:
ports:
- "5433:5432"
This maps port 5433
in your machine to port 5432
contained in the container.
You’ll then want to hook up with localhost:5433
as a substitute of localhost:5432
.
When you did the “Intro to Docker” tutorial, this configuration ought to look acquainted. Right here’s how the 2 approaches evaluate:
docker run command |
docker-compose.yaml equal |
---|---|
--name local_pg |
container_name: local_pg |
-e POSTGRES_USER=postgres |
surroundings: part |
-p 5432:5432 |
ports: part |
-v pgdata:/var/lib/postgresql/information |
volumes: part |
postgres:15 |
picture: postgres:15 |
With this Compose file in place, we’ve turned an extended command into one thing simpler to keep up, and we’re one step away from launching our database.
Step 3: Begin the container
From the identical folder, run:
docker compose up
Docker will learn the file, pull the Postgres picture if wanted, create the quantity, and begin the container. You need to see logs in your terminal exhibiting the database initializing. When you see a port battle error, scroll again to Step 2 for learn how to change the host port.
Now you can connect with the database similar to earlier than, both through the use of:
docker compose exec db bash
to get contained in the container, or- connecting to
localhost:5432
utilizing a GUI like DBeaver or pgAdmin.
From there, you possibly can run psql -U postgres -d merchandise
to work together with the database.
Step 4: Shut it down
Once you’re achieved, press Ctrl+C
to cease the container. This sends a sign to gracefully shut it down whereas protecting every thing else in place, together with the container and quantity.
If you wish to clear issues up fully, run:
docker compose down
This stops and removes the container and community, however leaves the quantity intact. The subsequent time you run docker compose up
, your information will nonetheless be there.
We’ve now launched a production-grade database utilizing a single command! Subsequent, we’ll write a Python script to hook up with this database and run a easy information operation.
Write a Python ETL Script
Within the earlier Docker tutorial, we loaded a CSV file into Postgres utilizing the command line. That works nicely when the file is clear and the schema is understood, however generally we have to examine, validate, or rework the information earlier than loading it.
That is the place Python turns into helpful.
On this step, we’ll write a small ETL script that connects to the Postgres container and inserts a brand new row. It simulates the type of insert logic you’d run on a schedule, and retains the give attention to how Compose helps coordinate it.
We’ll begin by writing and testing the script domestically, then containerize it and add it to our Compose setup.
Step 1: Set up Python dependencies
To hook up with a PostgreSQL database from Python, we’ll use a library known as psycopg2
. It’s a dependable, widely-used driver that lets our script execute SQL queries, handle transactions, and deal with database errors.
We’ll be utilizing the psycopg2-binary
model, which incorporates all essential construct dependencies and is simpler to put in.
Out of your terminal, run:
pip set up psycopg2-binary
This installs the package deal domestically so you possibly can run and check your script earlier than containerizing it. Later, you’ll embody the identical package deal inside your Docker picture.
Step 2: Begin constructing the script
Create a brand new file in the identical folder known as app.py
. You’ll construct your script step-by-step.
Begin by importing the required libraries and organising your connection settings:
import psycopg2
import os
Be aware: We’re importing psycopg2
although we put in psycopg2-binary
. What’s occurring right here?
The psycopg2-binary
package deal installs the identical core psycopg2
library, simply bundled with precompiled dependencies so it’s simpler to put in. You continue to import it as psycopg2
in your code as a result of that’s the precise library identify. The -binary
half simply refers to the way it’s packaged, not how you employ it.
Subsequent, in the identical app.py
file, outline the database connection settings. These will probably be learn from surroundings variables that Docker Compose provides when the script runs in a container.
When you’re testing domestically, you possibly can override them by setting the variables inline when operating the script (we’ll see an instance shortly).
Add the next strains:
db_host = os.getenv("DB_HOST", "db")
db_port = os.getenv("DB_PORT", "5432")
db_name = os.getenv("POSTGRES_DB", "merchandise")
db_user = os.getenv("POSTGRES_USER", "postgres")
db_pass = os.getenv("POSTGRES_PASSWORD", "postgres")
Tip: When you modified the host port in your Compose file (for instance, to 5433:5432
), remember to set DB_PORT=5433
when testing domestically, or the connection could fail.
To override the host when testing domestically:
DB_HOST=localhost python app.py
To override each the host and port:
DB_HOST=localhost DB_PORT=5433 python app.py
We use "db"
because the default hostname as a result of that’s the identify of the Postgres service in your Compose file. When the pipeline runs inside Docker, Compose connects each containers to the identical non-public community, and the db
hostname will robotically resolve to the right container.
Step 3: Insert a brand new row
Quite than loading a dataset from CSV or SQL, you’ll write a easy ETL operation that inserts a single new row into the greens
desk. This simulates a small “load” job such as you may run on a schedule to append new information to a rising desk.
Add the next code to app.py
:
new_vegetable = ("Parsnips", "Contemporary", 2.42, 2.19)
This tuple matches the schema of the desk you’ll create within the subsequent step.
Step 4: Connect with Postgres and insert the row
Now add the logic to hook up with the database and run the insert:
attempt:
conn = psycopg2.join(
host=db_host,
port=int(db_port), # Solid to int since env vars are strings
dbname=db_name,
person=db_user,
password=db_pass
)
cur = conn.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS greens (
id SERIAL PRIMARY KEY,
identify TEXT,
kind TEXT,
retail_price NUMERIC,
cup_equivalent_price NUMERIC
);
""")
cur.execute(
"""
INSERT INTO greens (identify, kind, retail_price, cup_equivalent_price)
VALUES (%s, %s, %s, %s);
""",
new_vegetable
)
conn.commit()
cur.shut()
conn.shut()
print(f"ETL full. 1 row inserted.")
besides Exception as e:
print("Error throughout ETL:", e)
This code connects to the database utilizing your earlier surroundings variable settings.
It then creates the greens
desk (if it doesn’t exist) and inserts the pattern row you outlined earlier.
If the desk already exists, Postgres will go away it alone due to CREATE TABLE IF NOT EXISTS
. This makes the script secure to run greater than as soon as with out breaking.
Be aware: This script will insert a brand new row each time it runs, even when the row is similar. That’s anticipated on this instance, since we’re specializing in how Compose coordinates providers, not on deduplication logic. In an actual ETL pipeline, you’d usually add logic to keep away from duplicates utilizing methods like:
- checking for present information earlier than insert,
- utilizing
ON CONFLICT
clauses, - or cleansing the desk first with
TRUNCATE
.
We’ll cowl these patterns in a future tutorial.
Step 5: Run the script
When you shut down your Postgres container within the earlier step, you’ll want to start out it once more earlier than operating the script. Out of your mission folder, run:
docker compose up -d
The -d
flag stands for “indifferent.” It tells Docker to start out the container and return management to your terminal so you possibly can run different instructions, like testing your Python script.
As soon as the database is operating, check your script by operating:
python app.py
If every thing is working, you need to see output like:
ETL full. 1 row inserted.
When you get an error like:
couldn't translate host identify "db" to handle: No such host is understood
Which means the script can’t discover the database. Scroll again to Step 2 for learn how to override the hostname when testing domestically.
You’ll be able to confirm the outcomes by connecting to the database service and operating a fast SQL question. In case your Compose setup continues to be operating within the background, run:
docker compose exec db psql -U postgres -d merchandise
This opens a psql
session contained in the operating container. Then attempt:
SELECT * FROM greens ORDER BY id DESC LIMIT 5;
You need to see the newest row, Parsnips
, within the outcomes. To exit the session, kind q
.
Within the subsequent step, you’ll containerize this Python script, add it to your Compose setup, and run the entire ETL pipeline with a single command.
Construct a Customized Docker Picture for the ETL App
To date, you’ve written a Python script that runs domestically and connects to a containerized Postgres database. Now you’ll containerize the script itself, so it will possibly run wherever, at the same time as half of a bigger pipeline.
Earlier than we construct it, let’s rapidly refresh the distinction between a Docker picture and a Docker container. A Docker picture is a blueprint for a container. It defines every thing the container wants: the bottom working system, put in packages, surroundings variables, recordsdata, and the command to run. Once you run a picture, Docker creates a reside, remoted surroundings known as a container.
You’ve already used prebuilt pictures like postgres:15
. Now you’ll construct your individual.
Step 1: Create a Dockerfile
Inside your compose-demo
folder, create a brand new file known as Dockerfile
(no file extension). Then add the next:
FROM python:3.10-slim
WORKDIR /app
COPY app.py .
RUN pip set up psycopg2-binary
CMD ["python", "app.py"]
Let’s stroll by what this file does:
- FROM python:3.10-slim begins with a minimal Debian-based picture that features Python.
- WORKDIR /app creates a working listing the place your code will reside.
- COPY app.py . copies your script into that listing contained in the container.
- RUN pip set up psycopg2-binary installs the identical Postgres driver you used domestically.
- CMD […] units the default command that may run when the container begins.
Step 2: Construct the picture
To construct the picture, run this from the identical folder as your Dockerfile
:
docker construct -t etl-app .
This command:
- Makes use of the present folder (
.
) because the construct context - Appears to be like for a file known as
Dockerfile
- Tags the ensuing picture with the identify
etl-app
As soon as the construct completes, verify that it labored:
docker pictures
You need to see etl-app
listed within the output.
Step 3: Attempt operating the container
Now attempt operating your new container:
docker run etl-app
This may begin the container and run the script, however except your Postgres container continues to be operating, it is going to probably fail with a connection error.
That’s anticipated.
Proper now, the Python container doesn’t know learn how to discover the database as a result of there’s no shared community, no surroundings variables, and no Compose setup. You’ll repair that within the subsequent step by including each providers to a single Compose file.
Replace the docker-compose.yaml
Earlier within the tutorial, we used Docker Compose to outline and run a single service: a Postgres database. Now that our ETL app is containerized, we’ll replace our present docker-compose.yaml
file to run each providers — the database and the app — in a single, related setup.
Docker Compose will deal with constructing the app, beginning each containers, connecting them over a shared community, and passing the appropriate surroundings variables, multi functional command. This setup makes it straightforward to swap out the app or run totally different variations simply by updating the docker-compose.yaml
file.
Step 1: Add the app service to your Compose file
Open docker-compose.yaml
and add the next beneath the prevailing providers:
part:
app:
construct: .
depends_on:
- db
surroundings:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: merchandise
DB_HOST: db
This tells Docker to:
- Construct the app utilizing the
Dockerfile
in your present folder - Look ahead to the database to start out earlier than operating
- Move in surroundings variables so the app can connect with the Postgres container
You don’t want to switch the db
service or the volumes:
part — go away these as they’re.
Step 2: Run and confirm the complete stack
With each providers outlined, we are able to now begin the complete pipeline with a single command:
docker compose up --build -d
This may rebuild our app picture (if wanted), launch each containers within the background, and join them over a shared community.
As soon as the containers are up, verify the logs out of your app container to confirm that it ran efficiently:
docker compose logs app
Search for this line:
ETL full. 1 row inserted.
Which means the app container was in a position to hook up with the database and run its logic efficiently.
When you get a database connection error, attempt operating the command once more. Compose’s depends_on ensures the database begins first, however doesn’t look forward to it to be prepared. In manufacturing, you’d use retry logic or a wait-for-it script to deal with this extra gracefully.
To substantiate the row was really inserted into the database, open a psql
session contained in the operating container:
docker compose exec db psql -U postgres -d merchandise
Then run a fast SQL question:
SELECT * FROM greens ORDER BY id DESC LIMIT 5;
You need to see your most up-to-date row (Parsnips
) within the output. Kind q
to exit.
Step 3: Shut it down
Once you’re achieved testing, cease and take away the containers with:
docker compose down
This tears down each containers however leaves your named quantity (pgdata
) intact so your information will nonetheless be there subsequent time you begin issues up.
Clear Up and Reuse
To run your pipeline once more, simply restart the providers:
docker compose up
As a result of your Compose setup makes use of a named quantity (pgdata
), your database will retain its information between runs, even after shutting every thing down.
Every time you restart the pipeline, the app container will re-run the script and insert the identical row except you replace the script logic. In an actual pipeline, you’d usually forestall that with checks, truncation, or ON CONFLICT
clauses.
Now you can check, tweak, and reuse this setup as many occasions as wanted.
Push Your App Picture to Docker Hub (elective)
To date, our ETL app runs domestically. However what if we need to run it on one other machine, share it with a teammate, or deploy it to the cloud?
Docker makes that straightforward by container registries, that are locations the place we are able to retailer and share Docker pictures. The most typical registry is Docker Hub, which affords free accounts and public repositories. Be aware that this step is elective and largely helpful if you wish to experiment with sharing your picture or utilizing it on one other laptop.
Step 1: Create a Docker Hub account
When you don’t have one but, go to hub.docker.com and join a free account. When you’re in, you possibly can create a brand new repository (for instance, etl-app
).
Step 2: Tag your picture
Docker pictures must be tagged along with your username and repository identify earlier than you possibly can push them. For instance, in case your username is myname
, run:
docker tag etl-app myname/etl-app:newest
This provides your native picture a brand new identify that factors to your Docker Hub account.
Step 3: Push the picture
Log in out of your terminal:
docker login
Then push the picture:
docker push myname/etl-app:newest
As soon as it’s uploaded, you (or anybody else) can pull and run the picture from wherever:
docker pull myname/etl-app:newest
That is particularly helpful if you wish to:
- Share your ETL container with collaborators
- Use it in cloud deployments or CI pipelines
- Again up your work in a versioned registry
When you’re not able to create an account, you possibly can skip this step and your picture will nonetheless work domestically as a part of your Compose setup.
Wrap-Up and Subsequent Steps
You’ve constructed and containerized a whole information pipeline utilizing Docker Compose.
Alongside the best way, you discovered learn how to:
- Construct and run customized Docker pictures
- Outline multi-service environments with a Compose file
- Move surroundings variables and join providers
- Use volumes for persistent storage
- Run, examine, and reuse your full stack with one command
This setup mirrors how real-world information pipelines are sometimes prototyped and examined as a result of Compose offers you a dependable, repeatable technique to construct and share these workflows.
The place to go subsequent
Listed here are a number of concepts for increasing your mission:
-
Schedule your pipeline
Use one thing like Airflow to run the job on a schedule.
-
Add logging or alerts
Log ETL standing to a file or ship notifications if one thing fails.
-
Remodel information or add validations
Add extra steps to your script to scrub, enrich, or validate incoming information.
-
Write exams
Validate that your script does what you anticipate, particularly because it grows.
-
Connect with real-world information sources
Pull from APIs or cloud storage buckets and cargo the outcomes into Postgres.
When you’re comfy with Compose, you’ll be capable of spin up production-like environments in seconds — an enormous win for testing, onboarding, and deployment.
As your information tasks develop, they usually contain multiple piece, like a database and a script. Working every thing by hand can get tedious and error-prone. One service wants to start out earlier than one other. A missed surroundings variable can break the entire movement.
Docker Compose makes this simpler. It permits you to outline your full setup in a single file and run every thing with a single command.
On this tutorial, you’ll construct a easy ETL (Extract, Remodel, Load) workflow utilizing Compose. It contains two providers:
- a PostgreSQL container that shops product information,
- and a Python container that hundreds and processes that information.
You’ll discover ways to outline multi-container apps, join providers, and check your full stack domestically, all with a single Compose command.
When you accomplished the earlier Docker tutorial, you’ll acknowledge some components of this setup, however you don’t want that tutorial to succeed right here.
What’s Docker Compose?
By default, Docker runs one container at a time utilizing docker run
instructions, which may get lengthy and repetitive. That works for fast exams, however as quickly as you want a number of providers, or simply need to keep away from copy/paste errors, it turns into fragile.
Docker Compose simplifies this by letting you outline your setup in a single file: docker-compose.yaml
. That file describes every service in your app, how they join, and learn how to configure them. As soon as that’s in place, Compose handles the remainder: it builds pictures, begins containers within the appropriate order, and connects every thing over a shared community, multi functional step.
Compose is simply as helpful for small setups, like a script and a database, with fewer possibilities for error.
To see how that works in observe, we’ll begin by launching a Postgres database with Compose. From there, we’ll add a second container that runs a Python script and connects to the database.
Run Postgres with Docker Compose (Single Service)
Say your staff is working with product information from a brand new vendor. You need to spin up a neighborhood PostgreSQL database so you can begin writing and testing your ETL logic earlier than deploying it elsewhere. On this early section, it’s frequent to start out with minimal information, generally even a single check row, simply to verify your pipeline works finish to finish earlier than wiring up actual information sources.
On this part, we’ll spin up a Postgres database utilizing Compose. This units up a neighborhood surroundings we are able to reuse as we construct out the remainder of the pipeline.
Earlier than including the Python ETL script, we’ll begin with simply the database service. This “single service” setup offers us a clear, remoted container that persists information utilizing a Docker quantity and might be related to utilizing both the terminal or a GUI.
Step 1: Create a mission folder
In your terminal, make a brand new folder for this mission and transfer into it:
mkdir compose-demo
cd compose-demo
You’ll maintain all of your Compose recordsdata and scripts right here.
Step 2: Write the Compose file
Contained in the folder, create a brand new file known as docker-compose.yaml
and add the next content material:
providers:
db:
picture: postgres:15
container_name: local_pg
surroundings:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: merchandise
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/information
volumes:
pgdata:
This defines a service named db
that runs the official postgres:15
picture, units some surroundings variables, exposes port 5432, and makes use of a named quantity for persistent storage.
Tip: If you have already got PostgreSQL operating domestically, port 5432
is likely to be in use. You’ll be able to keep away from conflicts by altering the host port. For instance:
ports:
- "5433:5432"
This maps port 5433
in your machine to port 5432
contained in the container.
You’ll then want to hook up with localhost:5433
as a substitute of localhost:5432
.
When you did the “Intro to Docker” tutorial, this configuration ought to look acquainted. Right here’s how the 2 approaches evaluate:
docker run command |
docker-compose.yaml equal |
---|---|
--name local_pg |
container_name: local_pg |
-e POSTGRES_USER=postgres |
surroundings: part |
-p 5432:5432 |
ports: part |
-v pgdata:/var/lib/postgresql/information |
volumes: part |
postgres:15 |
picture: postgres:15 |
With this Compose file in place, we’ve turned an extended command into one thing simpler to keep up, and we’re one step away from launching our database.
Step 3: Begin the container
From the identical folder, run:
docker compose up
Docker will learn the file, pull the Postgres picture if wanted, create the quantity, and begin the container. You need to see logs in your terminal exhibiting the database initializing. When you see a port battle error, scroll again to Step 2 for learn how to change the host port.
Now you can connect with the database similar to earlier than, both through the use of:
docker compose exec db bash
to get contained in the container, or- connecting to
localhost:5432
utilizing a GUI like DBeaver or pgAdmin.
From there, you possibly can run psql -U postgres -d merchandise
to work together with the database.
Step 4: Shut it down
Once you’re achieved, press Ctrl+C
to cease the container. This sends a sign to gracefully shut it down whereas protecting every thing else in place, together with the container and quantity.
If you wish to clear issues up fully, run:
docker compose down
This stops and removes the container and community, however leaves the quantity intact. The subsequent time you run docker compose up
, your information will nonetheless be there.
We’ve now launched a production-grade database utilizing a single command! Subsequent, we’ll write a Python script to hook up with this database and run a easy information operation.
Write a Python ETL Script
Within the earlier Docker tutorial, we loaded a CSV file into Postgres utilizing the command line. That works nicely when the file is clear and the schema is understood, however generally we have to examine, validate, or rework the information earlier than loading it.
That is the place Python turns into helpful.
On this step, we’ll write a small ETL script that connects to the Postgres container and inserts a brand new row. It simulates the type of insert logic you’d run on a schedule, and retains the give attention to how Compose helps coordinate it.
We’ll begin by writing and testing the script domestically, then containerize it and add it to our Compose setup.
Step 1: Set up Python dependencies
To hook up with a PostgreSQL database from Python, we’ll use a library known as psycopg2
. It’s a dependable, widely-used driver that lets our script execute SQL queries, handle transactions, and deal with database errors.
We’ll be utilizing the psycopg2-binary
model, which incorporates all essential construct dependencies and is simpler to put in.
Out of your terminal, run:
pip set up psycopg2-binary
This installs the package deal domestically so you possibly can run and check your script earlier than containerizing it. Later, you’ll embody the identical package deal inside your Docker picture.
Step 2: Begin constructing the script
Create a brand new file in the identical folder known as app.py
. You’ll construct your script step-by-step.
Begin by importing the required libraries and organising your connection settings:
import psycopg2
import os
Be aware: We’re importing psycopg2
although we put in psycopg2-binary
. What’s occurring right here?
The psycopg2-binary
package deal installs the identical core psycopg2
library, simply bundled with precompiled dependencies so it’s simpler to put in. You continue to import it as psycopg2
in your code as a result of that’s the precise library identify. The -binary
half simply refers to the way it’s packaged, not how you employ it.
Subsequent, in the identical app.py
file, outline the database connection settings. These will probably be learn from surroundings variables that Docker Compose provides when the script runs in a container.
When you’re testing domestically, you possibly can override them by setting the variables inline when operating the script (we’ll see an instance shortly).
Add the next strains:
db_host = os.getenv("DB_HOST", "db")
db_port = os.getenv("DB_PORT", "5432")
db_name = os.getenv("POSTGRES_DB", "merchandise")
db_user = os.getenv("POSTGRES_USER", "postgres")
db_pass = os.getenv("POSTGRES_PASSWORD", "postgres")
Tip: When you modified the host port in your Compose file (for instance, to 5433:5432
), remember to set DB_PORT=5433
when testing domestically, or the connection could fail.
To override the host when testing domestically:
DB_HOST=localhost python app.py
To override each the host and port:
DB_HOST=localhost DB_PORT=5433 python app.py
We use "db"
because the default hostname as a result of that’s the identify of the Postgres service in your Compose file. When the pipeline runs inside Docker, Compose connects each containers to the identical non-public community, and the db
hostname will robotically resolve to the right container.
Step 3: Insert a brand new row
Quite than loading a dataset from CSV or SQL, you’ll write a easy ETL operation that inserts a single new row into the greens
desk. This simulates a small “load” job such as you may run on a schedule to append new information to a rising desk.
Add the next code to app.py
:
new_vegetable = ("Parsnips", "Contemporary", 2.42, 2.19)
This tuple matches the schema of the desk you’ll create within the subsequent step.
Step 4: Connect with Postgres and insert the row
Now add the logic to hook up with the database and run the insert:
attempt:
conn = psycopg2.join(
host=db_host,
port=int(db_port), # Solid to int since env vars are strings
dbname=db_name,
person=db_user,
password=db_pass
)
cur = conn.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS greens (
id SERIAL PRIMARY KEY,
identify TEXT,
kind TEXT,
retail_price NUMERIC,
cup_equivalent_price NUMERIC
);
""")
cur.execute(
"""
INSERT INTO greens (identify, kind, retail_price, cup_equivalent_price)
VALUES (%s, %s, %s, %s);
""",
new_vegetable
)
conn.commit()
cur.shut()
conn.shut()
print(f"ETL full. 1 row inserted.")
besides Exception as e:
print("Error throughout ETL:", e)
This code connects to the database utilizing your earlier surroundings variable settings.
It then creates the greens
desk (if it doesn’t exist) and inserts the pattern row you outlined earlier.
If the desk already exists, Postgres will go away it alone due to CREATE TABLE IF NOT EXISTS
. This makes the script secure to run greater than as soon as with out breaking.
Be aware: This script will insert a brand new row each time it runs, even when the row is similar. That’s anticipated on this instance, since we’re specializing in how Compose coordinates providers, not on deduplication logic. In an actual ETL pipeline, you’d usually add logic to keep away from duplicates utilizing methods like:
- checking for present information earlier than insert,
- utilizing
ON CONFLICT
clauses, - or cleansing the desk first with
TRUNCATE
.
We’ll cowl these patterns in a future tutorial.
Step 5: Run the script
When you shut down your Postgres container within the earlier step, you’ll want to start out it once more earlier than operating the script. Out of your mission folder, run:
docker compose up -d
The -d
flag stands for “indifferent.” It tells Docker to start out the container and return management to your terminal so you possibly can run different instructions, like testing your Python script.
As soon as the database is operating, check your script by operating:
python app.py
If every thing is working, you need to see output like:
ETL full. 1 row inserted.
When you get an error like:
couldn't translate host identify "db" to handle: No such host is understood
Which means the script can’t discover the database. Scroll again to Step 2 for learn how to override the hostname when testing domestically.
You’ll be able to confirm the outcomes by connecting to the database service and operating a fast SQL question. In case your Compose setup continues to be operating within the background, run:
docker compose exec db psql -U postgres -d merchandise
This opens a psql
session contained in the operating container. Then attempt:
SELECT * FROM greens ORDER BY id DESC LIMIT 5;
You need to see the newest row, Parsnips
, within the outcomes. To exit the session, kind q
.
Within the subsequent step, you’ll containerize this Python script, add it to your Compose setup, and run the entire ETL pipeline with a single command.
Construct a Customized Docker Picture for the ETL App
To date, you’ve written a Python script that runs domestically and connects to a containerized Postgres database. Now you’ll containerize the script itself, so it will possibly run wherever, at the same time as half of a bigger pipeline.
Earlier than we construct it, let’s rapidly refresh the distinction between a Docker picture and a Docker container. A Docker picture is a blueprint for a container. It defines every thing the container wants: the bottom working system, put in packages, surroundings variables, recordsdata, and the command to run. Once you run a picture, Docker creates a reside, remoted surroundings known as a container.
You’ve already used prebuilt pictures like postgres:15
. Now you’ll construct your individual.
Step 1: Create a Dockerfile
Inside your compose-demo
folder, create a brand new file known as Dockerfile
(no file extension). Then add the next:
FROM python:3.10-slim
WORKDIR /app
COPY app.py .
RUN pip set up psycopg2-binary
CMD ["python", "app.py"]
Let’s stroll by what this file does:
- FROM python:3.10-slim begins with a minimal Debian-based picture that features Python.
- WORKDIR /app creates a working listing the place your code will reside.
- COPY app.py . copies your script into that listing contained in the container.
- RUN pip set up psycopg2-binary installs the identical Postgres driver you used domestically.
- CMD […] units the default command that may run when the container begins.
Step 2: Construct the picture
To construct the picture, run this from the identical folder as your Dockerfile
:
docker construct -t etl-app .
This command:
- Makes use of the present folder (
.
) because the construct context - Appears to be like for a file known as
Dockerfile
- Tags the ensuing picture with the identify
etl-app
As soon as the construct completes, verify that it labored:
docker pictures
You need to see etl-app
listed within the output.
Step 3: Attempt operating the container
Now attempt operating your new container:
docker run etl-app
This may begin the container and run the script, however except your Postgres container continues to be operating, it is going to probably fail with a connection error.
That’s anticipated.
Proper now, the Python container doesn’t know learn how to discover the database as a result of there’s no shared community, no surroundings variables, and no Compose setup. You’ll repair that within the subsequent step by including each providers to a single Compose file.
Replace the docker-compose.yaml
Earlier within the tutorial, we used Docker Compose to outline and run a single service: a Postgres database. Now that our ETL app is containerized, we’ll replace our present docker-compose.yaml
file to run each providers — the database and the app — in a single, related setup.
Docker Compose will deal with constructing the app, beginning each containers, connecting them over a shared community, and passing the appropriate surroundings variables, multi functional command. This setup makes it straightforward to swap out the app or run totally different variations simply by updating the docker-compose.yaml
file.
Step 1: Add the app service to your Compose file
Open docker-compose.yaml
and add the next beneath the prevailing providers:
part:
app:
construct: .
depends_on:
- db
surroundings:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: merchandise
DB_HOST: db
This tells Docker to:
- Construct the app utilizing the
Dockerfile
in your present folder - Look ahead to the database to start out earlier than operating
- Move in surroundings variables so the app can connect with the Postgres container
You don’t want to switch the db
service or the volumes:
part — go away these as they’re.
Step 2: Run and confirm the complete stack
With each providers outlined, we are able to now begin the complete pipeline with a single command:
docker compose up --build -d
This may rebuild our app picture (if wanted), launch each containers within the background, and join them over a shared community.
As soon as the containers are up, verify the logs out of your app container to confirm that it ran efficiently:
docker compose logs app
Search for this line:
ETL full. 1 row inserted.
Which means the app container was in a position to hook up with the database and run its logic efficiently.
When you get a database connection error, attempt operating the command once more. Compose’s depends_on ensures the database begins first, however doesn’t look forward to it to be prepared. In manufacturing, you’d use retry logic or a wait-for-it script to deal with this extra gracefully.
To substantiate the row was really inserted into the database, open a psql
session contained in the operating container:
docker compose exec db psql -U postgres -d merchandise
Then run a fast SQL question:
SELECT * FROM greens ORDER BY id DESC LIMIT 5;
You need to see your most up-to-date row (Parsnips
) within the output. Kind q
to exit.
Step 3: Shut it down
Once you’re achieved testing, cease and take away the containers with:
docker compose down
This tears down each containers however leaves your named quantity (pgdata
) intact so your information will nonetheless be there subsequent time you begin issues up.
Clear Up and Reuse
To run your pipeline once more, simply restart the providers:
docker compose up
As a result of your Compose setup makes use of a named quantity (pgdata
), your database will retain its information between runs, even after shutting every thing down.
Every time you restart the pipeline, the app container will re-run the script and insert the identical row except you replace the script logic. In an actual pipeline, you’d usually forestall that with checks, truncation, or ON CONFLICT
clauses.
Now you can check, tweak, and reuse this setup as many occasions as wanted.
Push Your App Picture to Docker Hub (elective)
To date, our ETL app runs domestically. However what if we need to run it on one other machine, share it with a teammate, or deploy it to the cloud?
Docker makes that straightforward by container registries, that are locations the place we are able to retailer and share Docker pictures. The most typical registry is Docker Hub, which affords free accounts and public repositories. Be aware that this step is elective and largely helpful if you wish to experiment with sharing your picture or utilizing it on one other laptop.
Step 1: Create a Docker Hub account
When you don’t have one but, go to hub.docker.com and join a free account. When you’re in, you possibly can create a brand new repository (for instance, etl-app
).
Step 2: Tag your picture
Docker pictures must be tagged along with your username and repository identify earlier than you possibly can push them. For instance, in case your username is myname
, run:
docker tag etl-app myname/etl-app:newest
This provides your native picture a brand new identify that factors to your Docker Hub account.
Step 3: Push the picture
Log in out of your terminal:
docker login
Then push the picture:
docker push myname/etl-app:newest
As soon as it’s uploaded, you (or anybody else) can pull and run the picture from wherever:
docker pull myname/etl-app:newest
That is particularly helpful if you wish to:
- Share your ETL container with collaborators
- Use it in cloud deployments or CI pipelines
- Again up your work in a versioned registry
When you’re not able to create an account, you possibly can skip this step and your picture will nonetheless work domestically as a part of your Compose setup.
Wrap-Up and Subsequent Steps
You’ve constructed and containerized a whole information pipeline utilizing Docker Compose.
Alongside the best way, you discovered learn how to:
- Construct and run customized Docker pictures
- Outline multi-service environments with a Compose file
- Move surroundings variables and join providers
- Use volumes for persistent storage
- Run, examine, and reuse your full stack with one command
This setup mirrors how real-world information pipelines are sometimes prototyped and examined as a result of Compose offers you a dependable, repeatable technique to construct and share these workflows.
The place to go subsequent
Listed here are a number of concepts for increasing your mission:
-
Schedule your pipeline
Use one thing like Airflow to run the job on a schedule.
-
Add logging or alerts
Log ETL standing to a file or ship notifications if one thing fails.
-
Remodel information or add validations
Add extra steps to your script to scrub, enrich, or validate incoming information.
-
Write exams
Validate that your script does what you anticipate, particularly because it grows.
-
Connect with real-world information sources
Pull from APIs or cloud storage buckets and cargo the outcomes into Postgres.
When you’re comfy with Compose, you’ll be capable of spin up production-like environments in seconds — an enormous win for testing, onboarding, and deployment.