Intro to Docker Compose – Dataquest

As your information tasks develop, they usually contain multiple piece, like a database and a script. Working every thing by hand can get tedious and error-prone. One service wants to start out earlier than one other. A missed surroundings variable can break the entire movement.

Docker Compose makes this simpler. It permits you to outline your full setup in a single file and run every thing with a single command.

On this tutorial, you’ll construct a easy ETL (Extract, Remodel, Load) workflow utilizing Compose. It contains two providers:

a PostgreSQL container that shops product information,
and a Python container that hundreds and processes that information.

You’ll discover ways to outline multi-container apps, join providers, and check your full stack domestically, all with a single Compose command.

When you accomplished the earlier Docker tutorial, you’ll acknowledge some components of this setup, however you don’t want that tutorial to succeed right here.

What’s Docker Compose?

By default, Docker runs one container at a time utilizing docker run instructions, which may get lengthy and repetitive. That works for fast exams, however as quickly as you want a number of providers, or simply need to keep away from copy/paste errors, it turns into fragile.

Docker Compose simplifies this by letting you outline your setup in a single file: docker-compose.yaml. That file describes every service in your app, how they join, and learn how to configure them. As soon as that’s in place, Compose handles the remainder: it builds pictures, begins containers within the appropriate order, and connects every thing over a shared community, multi functional step.

Compose is simply as helpful for small setups, like a script and a database, with fewer possibilities for error.

To see how that works in observe, we’ll begin by launching a Postgres database with Compose. From there, we’ll add a second container that runs a Python script and connects to the database.

Run Postgres with Docker Compose (Single Service)

Say your staff is working with product information from a brand new vendor. You need to spin up a neighborhood PostgreSQL database so you can begin writing and testing your ETL logic earlier than deploying it elsewhere. On this early section, it’s frequent to start out with minimal information, generally even a single check row, simply to verify your pipeline works finish to finish earlier than wiring up actual information sources.

On this part, we’ll spin up a Postgres database utilizing Compose. This units up a neighborhood surroundings we are able to reuse as we construct out the remainder of the pipeline.

Earlier than including the Python ETL script, we’ll begin with simply the database service. This “single service” setup offers us a clear, remoted container that persists information utilizing a Docker quantity and might be related to utilizing both the terminal or a GUI.

Step 1: Create a mission folder

In your terminal, make a brand new folder for this mission and transfer into it:

mkdir compose-demo
cd compose-demo

You’ll maintain all of your Compose recordsdata and scripts right here.

Step 2: Write the Compose file

Contained in the folder, create a brand new file known as docker-compose.yaml and add the next content material:

providers:
  db:
    picture: postgres:15
    container_name: local_pg
    surroundings:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: merchandise
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/information

volumes:
  pgdata:

This defines a service named db that runs the official postgres:15 picture, units some surroundings variables, exposes port 5432, and makes use of a named quantity for persistent storage.

Tip: If you have already got PostgreSQL operating domestically, port 5432 is likely to be in use. You’ll be able to keep away from conflicts by altering the host port. For instance:

ports:
  - "5433:5432"

This maps port 5433 in your machine to port 5432 contained in the container.
You’ll then want to hook up with localhost:5433 as a substitute of localhost:5432.

When you did the “Intro to Docker” tutorial, this configuration ought to look acquainted. Right here’s how the 2 approaches evaluate:

`docker run` command	`docker-compose.yaml` equal
`--name local_pg`	`container_name: local_pg`
`-e POSTGRES_USER=postgres`	`surroundings:` part
`-p 5432:5432`	`ports:` part
`-v pgdata:/var/lib/postgresql/information`	`volumes:` part
`postgres:15`	`picture: postgres:15`

With this Compose file in place, we’ve turned an extended command into one thing simpler to keep up, and we’re one step away from launching our database.

Step 3: Begin the container

From the identical folder, run:

docker compose up

Docker will learn the file, pull the Postgres picture if wanted, create the quantity, and begin the container. You need to see logs in your terminal exhibiting the database initializing. When you see a port battle error, scroll again to Step 2 for learn how to change the host port.

Now you can connect with the database similar to earlier than, both through the use of:

docker compose exec db bash to get contained in the container, or
connecting to localhost:5432 utilizing a GUI like DBeaver or pgAdmin.

From there, you possibly can run psql -U postgres -d merchandise to work together with the database.

Step 4: Shut it down

Once you’re achieved, press Ctrl+C to cease the container. This sends a sign to gracefully shut it down whereas protecting every thing else in place, together with the container and quantity.

If you wish to clear issues up fully, run:

docker compose down

This stops and removes the container and community, however leaves the quantity intact. The subsequent time you run docker compose up, your information will nonetheless be there.

We’ve now launched a production-grade database utilizing a single command! Subsequent, we’ll write a Python script to hook up with this database and run a easy information operation.

Write a Python ETL Script

Within the earlier Docker tutorial, we loaded a CSV file into Postgres utilizing the command line. That works nicely when the file is clear and the schema is understood, however generally we have to examine, validate, or rework the information earlier than loading it.

That is the place Python turns into helpful.

On this step, we’ll write a small ETL script that connects to the Postgres container and inserts a brand new row. It simulates the type of insert logic you’d run on a schedule, and retains the give attention to how Compose helps coordinate it.

We’ll begin by writing and testing the script domestically, then containerize it and add it to our Compose setup.

Step 1: Set up Python dependencies

To hook up with a PostgreSQL database from Python, we’ll use a library known as psycopg2. It’s a dependable, widely-used driver that lets our script execute SQL queries, handle transactions, and deal with database errors.

We’ll be utilizing the psycopg2-binary model, which incorporates all essential construct dependencies and is simpler to put in.

Out of your terminal, run:

pip set up psycopg2-binary

This installs the package deal domestically so you possibly can run and check your script earlier than containerizing it. Later, you’ll embody the identical package deal inside your Docker picture.

Step 2: Begin constructing the script

Create a brand new file in the identical folder known as app.py. You’ll construct your script step-by-step.

Begin by importing the required libraries and organising your connection settings:

import psycopg2
import os

Be aware: We’re importing psycopg2 although we put in psycopg2-binary. What’s occurring right here?
The psycopg2-binary package deal installs the identical core psycopg2 library, simply bundled with precompiled dependencies so it’s simpler to put in. You continue to import it as psycopg2 in your code as a result of that’s the precise library identify. The -binary half simply refers to the way it’s packaged, not how you employ it.

Subsequent, in the identical app.py file, outline the database connection settings. These will probably be learn from surroundings variables that Docker Compose provides when the script runs in a container.

When you’re testing domestically, you possibly can override them by setting the variables inline when operating the script (we’ll see an instance shortly).

Add the next strains:

db_host = os.getenv("DB_HOST", "db")
db_port = os.getenv("DB_PORT", "5432")
db_name = os.getenv("POSTGRES_DB", "merchandise")
db_user = os.getenv("POSTGRES_USER", "postgres")
db_pass = os.getenv("POSTGRES_PASSWORD", "postgres")

Tip: When you modified the host port in your Compose file (for instance, to 5433:5432), remember to set DB_PORT=5433 when testing domestically, or the connection could fail.

To override the host when testing domestically:

DB_HOST=localhost python app.py

To override each the host and port:

DB_HOST=localhost DB_PORT=5433 python app.py

We use "db" because the default hostname as a result of that’s the identify of the Postgres service in your Compose file. When the pipeline runs inside Docker, Compose connects each containers to the identical non-public community, and the db hostname will robotically resolve to the right container.

Step 3: Insert a brand new row

Quite than loading a dataset from CSV or SQL, you’ll write a easy ETL operation that inserts a single new row into the greens desk. This simulates a small “load” job such as you may run on a schedule to append new information to a rising desk.

Add the next code to app.py:

new_vegetable = ("Parsnips", "Contemporary", 2.42, 2.19)

This tuple matches the schema of the desk you’ll create within the subsequent step.

Step 4: Connect with Postgres and insert the row

Now add the logic to hook up with the database and run the insert:

attempt:
    conn = psycopg2.join(
        host=db_host,
        port=int(db_port), # Solid to int since env vars are strings
        dbname=db_name,
        person=db_user,
        password=db_pass
    )
    cur = conn.cursor()

    cur.execute("""
        CREATE TABLE IF NOT EXISTS greens (
            id SERIAL PRIMARY KEY,
            identify TEXT,
            kind TEXT,
            retail_price NUMERIC,
            cup_equivalent_price NUMERIC
        );
    """)

    cur.execute(
        """
        INSERT INTO greens (identify, kind, retail_price, cup_equivalent_price)
        VALUES (%s, %s, %s, %s);
        """,
        new_vegetable
    )

    conn.commit()
    cur.shut()
    conn.shut()
    print(f"ETL full. 1 row inserted.")

besides Exception as e:
    print("Error throughout ETL:", e)

This code connects to the database utilizing your earlier surroundings variable settings.
It then creates the greens desk (if it doesn’t exist) and inserts the pattern row you outlined earlier.

If the desk already exists, Postgres will go away it alone due to CREATE TABLE IF NOT EXISTS. This makes the script secure to run greater than as soon as with out breaking.

Be aware: This script will insert a brand new row each time it runs, even when the row is similar. That’s anticipated on this instance, since we’re specializing in how Compose coordinates providers, not on deduplication logic. In an actual ETL pipeline, you’d usually add logic to keep away from duplicates utilizing methods like:

checking for present information earlier than insert,
utilizing ON CONFLICT clauses,
or cleansing the desk first with TRUNCATE.

We’ll cowl these patterns in a future tutorial.

Step 5: Run the script

When you shut down your Postgres container within the earlier step, you’ll want to start out it once more earlier than operating the script. Out of your mission folder, run:

docker compose up -d

The -d flag stands for “indifferent.” It tells Docker to start out the container and return management to your terminal so you possibly can run different instructions, like testing your Python script.

As soon as the database is operating, check your script by operating:

python app.py

If every thing is working, you need to see output like:

ETL full. 1 row inserted.

When you get an error like:
couldn't translate host identify "db" to handle: No such host is understood
Which means the script can’t discover the database. Scroll again to Step 2 for learn how to override the hostname when testing domestically.

You’ll be able to confirm the outcomes by connecting to the database service and operating a fast SQL question. In case your Compose setup continues to be operating within the background, run:

docker compose exec db psql -U postgres -d merchandise

This opens a psql session contained in the operating container. Then attempt:

SELECT * FROM greens ORDER BY id DESC LIMIT 5;

You need to see the newest row, Parsnips , within the outcomes. To exit the session, kind q.

Within the subsequent step, you’ll containerize this Python script, add it to your Compose setup, and run the entire ETL pipeline with a single command.

Construct a Customized Docker Picture for the ETL App

To date, you’ve written a Python script that runs domestically and connects to a containerized Postgres database. Now you’ll containerize the script itself, so it will possibly run wherever, at the same time as half of a bigger pipeline.

Earlier than we construct it, let’s rapidly refresh the distinction between a Docker picture and a Docker container. A Docker picture is a blueprint for a container. It defines every thing the container wants: the bottom working system, put in packages, surroundings variables, recordsdata, and the command to run. Once you run a picture, Docker creates a reside, remoted surroundings known as a container.

You’ve already used prebuilt pictures like postgres:15. Now you’ll construct your individual.

Step 1: Create a Dockerfile

Inside your compose-demo folder, create a brand new file known as Dockerfile (no file extension). Then add the next:

FROM python:3.10-slim

WORKDIR /app

COPY app.py .

RUN pip set up psycopg2-binary

CMD ["python", "app.py"]

Let’s stroll by what this file does:

FROM python:3.10-slim begins with a minimal Debian-based picture that features Python.
WORKDIR /app creates a working listing the place your code will reside.
COPY app.py . copies your script into that listing contained in the container.
RUN pip set up psycopg2-binary installs the identical Postgres driver you used domestically.
CMD […] units the default command that may run when the container begins.

Step 2: Construct the picture

To construct the picture, run this from the identical folder as your Dockerfile:

docker construct -t etl-app .

This command:

Makes use of the present folder (.) because the construct context
Appears to be like for a file known as Dockerfile
Tags the ensuing picture with the identify etl-app

As soon as the construct completes, verify that it labored:

docker pictures

You need to see etl-app listed within the output.

Step 3: Attempt operating the container

Now attempt operating your new container:

docker run etl-app

This may begin the container and run the script, however except your Postgres container continues to be operating, it is going to probably fail with a connection error.

That’s anticipated.

Proper now, the Python container doesn’t know learn how to discover the database as a result of there’s no shared community, no surroundings variables, and no Compose setup. You’ll repair that within the subsequent step by including each providers to a single Compose file.

Replace the `docker-compose.yaml`

Earlier within the tutorial, we used Docker Compose to outline and run a single service: a Postgres database. Now that our ETL app is containerized, we’ll replace our present docker-compose.yaml file to run each providers — the database and the app — in a single, related setup.

Docker Compose will deal with constructing the app, beginning each containers, connecting them over a shared community, and passing the appropriate surroundings variables, multi functional command. This setup makes it straightforward to swap out the app or run totally different variations simply by updating the docker-compose.yaml file.

Step 1: Add the app service to your Compose file

Open docker-compose.yaml and add the next beneath the prevailing providers: part:

  app:
    construct: .
    depends_on:
      - db
    surroundings:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: merchandise
      DB_HOST: db

This tells Docker to:

Construct the app utilizing the Dockerfile in your present folder
Look ahead to the database to start out earlier than operating
Move in surroundings variables so the app can connect with the Postgres container

You don’t want to switch the db service or the volumes: part — go away these as they’re.

Step 2: Run and confirm the complete stack

With each providers outlined, we are able to now begin the complete pipeline with a single command:

docker compose up --build -d

This may rebuild our app picture (if wanted), launch each containers within the background, and join them over a shared community.

As soon as the containers are up, verify the logs out of your app container to confirm that it ran efficiently:

docker compose logs app

Search for this line:

ETL full. 1 row inserted.

Which means the app container was in a position to hook up with the database and run its logic efficiently.

When you get a database connection error, attempt operating the command once more. Compose’s depends_on ensures the database begins first, however doesn’t look forward to it to be prepared. In manufacturing, you’d use retry logic or a wait-for-it script to deal with this extra gracefully.

To substantiate the row was really inserted into the database, open a psql session contained in the operating container:

docker compose exec db psql -U postgres -d merchandise

Then run a fast SQL question:

SELECT * FROM greens ORDER BY id DESC LIMIT 5;

You need to see your most up-to-date row (Parsnips) within the output. Kind q to exit.

Step 3: Shut it down

Once you’re achieved testing, cease and take away the containers with:

docker compose down

This tears down each containers however leaves your named quantity (pgdata) intact so your information will nonetheless be there subsequent time you begin issues up.

Clear Up and Reuse

To run your pipeline once more, simply restart the providers:

docker compose up

As a result of your Compose setup makes use of a named quantity (pgdata), your database will retain its information between runs, even after shutting every thing down.

Every time you restart the pipeline, the app container will re-run the script and insert the identical row except you replace the script logic. In an actual pipeline, you’d usually forestall that with checks, truncation, or ON CONFLICT clauses.

Now you can check, tweak, and reuse this setup as many occasions as wanted.

Push Your App Picture to Docker Hub (elective)

To date, our ETL app runs domestically. However what if we need to run it on one other machine, share it with a teammate, or deploy it to the cloud?

Docker makes that straightforward by container registries, that are locations the place we are able to retailer and share Docker pictures. The most typical registry is Docker Hub, which affords free accounts and public repositories. Be aware that this step is elective and largely helpful if you wish to experiment with sharing your picture or utilizing it on one other laptop.

Step 1: Create a Docker Hub account

When you don’t have one but, go to hub.docker.com and join a free account. When you’re in, you possibly can create a brand new repository (for instance, etl-app).

Step 2: Tag your picture

Docker pictures must be tagged along with your username and repository identify earlier than you possibly can push them. For instance, in case your username is myname, run:

docker tag etl-app myname/etl-app:newest

This provides your native picture a brand new identify that factors to your Docker Hub account.

Step 3: Push the picture

docker login

Then push the picture:

docker push myname/etl-app:newest

As soon as it’s uploaded, you (or anybody else) can pull and run the picture from wherever:

docker pull myname/etl-app:newest

That is particularly helpful if you wish to:

Share your ETL container with collaborators
Use it in cloud deployments or CI pipelines
Again up your work in a versioned registry

When you’re not able to create an account, you possibly can skip this step and your picture will nonetheless work domestically as a part of your Compose setup.

Wrap-Up and Subsequent Steps

You’ve constructed and containerized a whole information pipeline utilizing Docker Compose.

Alongside the best way, you discovered learn how to:

Construct and run customized Docker pictures
Outline multi-service environments with a Compose file
Move surroundings variables and join providers
Use volumes for persistent storage
Run, examine, and reuse your full stack with one command

This setup mirrors how real-world information pipelines are sometimes prototyped and examined as a result of Compose offers you a dependable, repeatable technique to construct and share these workflows.

The place to go subsequent

Listed here are a number of concepts for increasing your mission:

Schedule your pipeline

Use one thing like Airflow to run the job on a schedule.
Add logging or alerts

Log ETL standing to a file or ship notifications if one thing fails.
Remodel information or add validations

Add extra steps to your script to scrub, enrich, or validate incoming information.
Write exams

Validate that your script does what you anticipate, particularly because it grows.
Connect with real-world information sources

Pull from APIs or cloud storage buckets and cargo the outcomes into Postgres.

When you’re comfy with Compose, you’ll be capable of spin up production-like environments in seconds — an enormous win for testing, onboarding, and deployment.

What Is Polarity? – Dataconomy

“Flipping the Narrative in ‘Slouching In the direction of Utopia'”: Counter-narratives going past the default economics mannequin of exponential development

How Geospatial Evaluation is Revolutionizing Emergency Response

Docker Compose makes this simpler. It permits you to outline your full setup in a single file and run every thing with a single command.

On this tutorial, you’ll construct a easy ETL (Extract, Remodel, Load) workflow utilizing Compose. It contains two providers:

a PostgreSQL container that shops product information,
and a Python container that hundreds and processes that information.

You’ll discover ways to outline multi-container apps, join providers, and check your full stack domestically, all with a single Compose command.

When you accomplished the earlier Docker tutorial, you’ll acknowledge some components of this setup, however you don’t want that tutorial to succeed right here.

What’s Docker Compose?

Compose is simply as helpful for small setups, like a script and a database, with fewer possibilities for error.

To see how that works in observe, we’ll begin by launching a Postgres database with Compose. From there, we’ll add a second container that runs a Python script and connects to the database.

Run Postgres with Docker Compose (Single Service)

On this part, we’ll spin up a Postgres database utilizing Compose. This units up a neighborhood surroundings we are able to reuse as we construct out the remainder of the pipeline.

Step 1: Create a mission folder

In your terminal, make a brand new folder for this mission and transfer into it:

mkdir compose-demo
cd compose-demo

You’ll maintain all of your Compose recordsdata and scripts right here.

Step 2: Write the Compose file

Contained in the folder, create a brand new file known as docker-compose.yaml and add the next content material:

providers:
  db:
    picture: postgres:15
    container_name: local_pg
    surroundings:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: merchandise
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/information

volumes:
  pgdata:

This defines a service named db that runs the official postgres:15 picture, units some surroundings variables, exposes port 5432, and makes use of a named quantity for persistent storage.

Tip: If you have already got PostgreSQL operating domestically, port 5432 is likely to be in use. You’ll be able to keep away from conflicts by altering the host port. For instance:

ports:
  - "5433:5432"

This maps port 5433 in your machine to port 5432 contained in the container.
You’ll then want to hook up with localhost:5433 as a substitute of localhost:5432.

When you did the “Intro to Docker” tutorial, this configuration ought to look acquainted. Right here’s how the 2 approaches evaluate:

`docker run` command	`docker-compose.yaml` equal
`--name local_pg`	`container_name: local_pg`
`-e POSTGRES_USER=postgres`	`surroundings:` part
`-p 5432:5432`	`ports:` part
`-v pgdata:/var/lib/postgresql/information`	`volumes:` part
`postgres:15`	`picture: postgres:15`

With this Compose file in place, we’ve turned an extended command into one thing simpler to keep up, and we’re one step away from launching our database.

Step 3: Begin the container

From the identical folder, run:

docker compose up

Now you can connect with the database similar to earlier than, both through the use of:

docker compose exec db bash to get contained in the container, or
connecting to localhost:5432 utilizing a GUI like DBeaver or pgAdmin.

From there, you possibly can run psql -U postgres -d merchandise to work together with the database.

Step 4: Shut it down

Once you’re achieved, press Ctrl+C to cease the container. This sends a sign to gracefully shut it down whereas protecting every thing else in place, together with the container and quantity.

If you wish to clear issues up fully, run:

docker compose down

This stops and removes the container and community, however leaves the quantity intact. The subsequent time you run docker compose up, your information will nonetheless be there.

We’ve now launched a production-grade database utilizing a single command! Subsequent, we’ll write a Python script to hook up with this database and run a easy information operation.

Write a Python ETL Script

That is the place Python turns into helpful.

We’ll begin by writing and testing the script domestically, then containerize it and add it to our Compose setup.

Step 1: Set up Python dependencies

We’ll be utilizing the psycopg2-binary model, which incorporates all essential construct dependencies and is simpler to put in.

Out of your terminal, run:

pip set up psycopg2-binary

This installs the package deal domestically so you possibly can run and check your script earlier than containerizing it. Later, you’ll embody the identical package deal inside your Docker picture.

Step 2: Begin constructing the script

Create a brand new file in the identical folder known as app.py. You’ll construct your script step-by-step.

Begin by importing the required libraries and organising your connection settings:

import psycopg2
import os

When you’re testing domestically, you possibly can override them by setting the variables inline when operating the script (we’ll see an instance shortly).

Add the next strains:

db_host = os.getenv("DB_HOST", "db")
db_port = os.getenv("DB_PORT", "5432")
db_name = os.getenv("POSTGRES_DB", "merchandise")
db_user = os.getenv("POSTGRES_USER", "postgres")
db_pass = os.getenv("POSTGRES_PASSWORD", "postgres")

Tip: When you modified the host port in your Compose file (for instance, to 5433:5432), remember to set DB_PORT=5433 when testing domestically, or the connection could fail.

To override the host when testing domestically:

DB_HOST=localhost python app.py

To override each the host and port:

DB_HOST=localhost DB_PORT=5433 python app.py

Step 3: Insert a brand new row

Add the next code to app.py:

new_vegetable = ("Parsnips", "Contemporary", 2.42, 2.19)

This tuple matches the schema of the desk you’ll create within the subsequent step.

Step 4: Connect with Postgres and insert the row

Now add the logic to hook up with the database and run the insert:

attempt:
    conn = psycopg2.join(
        host=db_host,
        port=int(db_port), # Solid to int since env vars are strings
        dbname=db_name,
        person=db_user,
        password=db_pass
    )
    cur = conn.cursor()

    cur.execute("""
        CREATE TABLE IF NOT EXISTS greens (
            id SERIAL PRIMARY KEY,
            identify TEXT,
            kind TEXT,
            retail_price NUMERIC,
            cup_equivalent_price NUMERIC
        );
    """)

    cur.execute(
        """
        INSERT INTO greens (identify, kind, retail_price, cup_equivalent_price)
        VALUES (%s, %s, %s, %s);
        """,
        new_vegetable
    )

    conn.commit()
    cur.shut()
    conn.shut()
    print(f"ETL full. 1 row inserted.")

besides Exception as e:
    print("Error throughout ETL:", e)

This code connects to the database utilizing your earlier surroundings variable settings.
It then creates the greens desk (if it doesn’t exist) and inserts the pattern row you outlined earlier.

If the desk already exists, Postgres will go away it alone due to CREATE TABLE IF NOT EXISTS. This makes the script secure to run greater than as soon as with out breaking.

checking for present information earlier than insert,
utilizing ON CONFLICT clauses,
or cleansing the desk first with TRUNCATE.

We’ll cowl these patterns in a future tutorial.

Step 5: Run the script

When you shut down your Postgres container within the earlier step, you’ll want to start out it once more earlier than operating the script. Out of your mission folder, run:

docker compose up -d

As soon as the database is operating, check your script by operating:

python app.py

If every thing is working, you need to see output like:

ETL full. 1 row inserted.

When you get an error like:
couldn't translate host identify "db" to handle: No such host is understood
Which means the script can’t discover the database. Scroll again to Step 2 for learn how to override the hostname when testing domestically.

You’ll be able to confirm the outcomes by connecting to the database service and operating a fast SQL question. In case your Compose setup continues to be operating within the background, run:

docker compose exec db psql -U postgres -d merchandise

This opens a psql session contained in the operating container. Then attempt:

SELECT * FROM greens ORDER BY id DESC LIMIT 5;

You need to see the newest row, Parsnips , within the outcomes. To exit the session, kind q.

Within the subsequent step, you’ll containerize this Python script, add it to your Compose setup, and run the entire ETL pipeline with a single command.

Construct a Customized Docker Picture for the ETL App

You’ve already used prebuilt pictures like postgres:15. Now you’ll construct your individual.

Step 1: Create a Dockerfile

Inside your compose-demo folder, create a brand new file known as Dockerfile (no file extension). Then add the next:

FROM python:3.10-slim

WORKDIR /app

COPY app.py .

RUN pip set up psycopg2-binary

CMD ["python", "app.py"]

Let’s stroll by what this file does:

FROM python:3.10-slim begins with a minimal Debian-based picture that features Python.
WORKDIR /app creates a working listing the place your code will reside.
COPY app.py . copies your script into that listing contained in the container.
RUN pip set up psycopg2-binary installs the identical Postgres driver you used domestically.
CMD […] units the default command that may run when the container begins.

Step 2: Construct the picture

To construct the picture, run this from the identical folder as your Dockerfile:

docker construct -t etl-app .

This command:

Makes use of the present folder (.) because the construct context
Appears to be like for a file known as Dockerfile
Tags the ensuing picture with the identify etl-app

As soon as the construct completes, verify that it labored:

docker pictures

You need to see etl-app listed within the output.

Step 3: Attempt operating the container

Now attempt operating your new container:

docker run etl-app

This may begin the container and run the script, however except your Postgres container continues to be operating, it is going to probably fail with a connection error.

That’s anticipated.

Replace the `docker-compose.yaml`

Step 1: Add the app service to your Compose file

Open docker-compose.yaml and add the next beneath the prevailing providers: part:

  app:
    construct: .
    depends_on:
      - db
    surroundings:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: merchandise
      DB_HOST: db

This tells Docker to:

Construct the app utilizing the Dockerfile in your present folder
Look ahead to the database to start out earlier than operating
Move in surroundings variables so the app can connect with the Postgres container

You don’t want to switch the db service or the volumes: part — go away these as they’re.

Step 2: Run and confirm the complete stack

With each providers outlined, we are able to now begin the complete pipeline with a single command:

docker compose up --build -d

This may rebuild our app picture (if wanted), launch each containers within the background, and join them over a shared community.

As soon as the containers are up, verify the logs out of your app container to confirm that it ran efficiently:

docker compose logs app

Search for this line:

ETL full. 1 row inserted.

Which means the app container was in a position to hook up with the database and run its logic efficiently.

When you get a database connection error, attempt operating the command once more. Compose’s depends_on ensures the database begins first, however doesn’t look forward to it to be prepared. In manufacturing, you’d use retry logic or a wait-for-it script to deal with this extra gracefully.

To substantiate the row was really inserted into the database, open a psql session contained in the operating container:

docker compose exec db psql -U postgres -d merchandise

Then run a fast SQL question:

SELECT * FROM greens ORDER BY id DESC LIMIT 5;

You need to see your most up-to-date row (Parsnips) within the output. Kind q to exit.

Step 3: Shut it down

Once you’re achieved testing, cease and take away the containers with:

docker compose down

This tears down each containers however leaves your named quantity (pgdata) intact so your information will nonetheless be there subsequent time you begin issues up.

Clear Up and Reuse

To run your pipeline once more, simply restart the providers:

docker compose up

As a result of your Compose setup makes use of a named quantity (pgdata), your database will retain its information between runs, even after shutting every thing down.

Now you can check, tweak, and reuse this setup as many occasions as wanted.

Push Your App Picture to Docker Hub (elective)

To date, our ETL app runs domestically. However what if we need to run it on one other machine, share it with a teammate, or deploy it to the cloud?

Step 1: Create a Docker Hub account

When you don’t have one but, go to hub.docker.com and join a free account. When you’re in, you possibly can create a brand new repository (for instance, etl-app).

Step 2: Tag your picture

Docker pictures must be tagged along with your username and repository identify earlier than you possibly can push them. For instance, in case your username is myname, run:

docker tag etl-app myname/etl-app:newest

This provides your native picture a brand new identify that factors to your Docker Hub account.

Step 3: Push the picture

docker login

Then push the picture:

docker push myname/etl-app:newest

As soon as it’s uploaded, you (or anybody else) can pull and run the picture from wherever:

docker pull myname/etl-app:newest

That is particularly helpful if you wish to:

Share your ETL container with collaborators
Use it in cloud deployments or CI pipelines
Again up your work in a versioned registry

When you’re not able to create an account, you possibly can skip this step and your picture will nonetheless work domestically as a part of your Compose setup.

Wrap-Up and Subsequent Steps

You’ve constructed and containerized a whole information pipeline utilizing Docker Compose.

Alongside the best way, you discovered learn how to:

Construct and run customized Docker pictures
Outline multi-service environments with a Compose file
Move surroundings variables and join providers
Use volumes for persistent storage
Run, examine, and reuse your full stack with one command

The place to go subsequent

Listed here are a number of concepts for increasing your mission:

Schedule your pipeline

Use one thing like Airflow to run the job on a schedule.
Add logging or alerts

Log ETL standing to a file or ship notifications if one thing fails.
Remodel information or add validations

Add extra steps to your script to scrub, enrich, or validate incoming information.
Write exams

Validate that your script does what you anticipate, particularly because it grows.
Connect with real-world information sources

Pull from APIs or cloud storage buckets and cargo the outcomes into Postgres.

When you’re comfy with Compose, you’ll be capable of spin up production-like environments in seconds — an enormous win for testing, onboarding, and deployment.

Intro to Docker Compose – Dataquest

What Is Polarity? – Dataconomy

“Flipping the Narrative in ‘Slouching In the direction of Utopia'”: Counter-narratives going past the default economics mannequin of exponential development

How Geospatial Evaluation is Revolutionizing Emergency Response

The Definitive Information to AI Brokers: Architectures, Frameworks, and Actual-World Purposes (2025)

Md Sazzad Hossain

Related Posts

What Is Polarity? – Dataconomy

“Flipping the Narrative in ‘Slouching In the direction of Utopia'”: Counter-narratives going past the default economics mannequin of exponential development

How Geospatial Evaluation is Revolutionizing Emergency Response

Your 1M+ Context Window LLM Is Much less Highly effective Than You Suppose

How AI and Good Platforms Enhance Electronic mail Advertising

Leave a Reply Cancel reply

Recommended

DragonForce actors goal SimpleHelp vulnerabilities to assault MSP, clients – Sophos Information

New AI Innovation Hub in Tunisia Drives Technological Development Throughout Africa

Categories

CyberDefenseGo

Recent