How to Install Apache Airflow on Mac
Apache Airflow is a popular workflow management platform that can be used to automate tasks such as data processing, machine learning, and data analysis. It is a powerful tool that can be used to manage complex workflows, but it can also be daunting to install and set up. (especially for Mac users if they want to install locally like me 🥲)
In this article, we will walk you through the steps on how to install Airflow on your local machine for Mac users.
First of all, like me, you may have followed the documentation and tried to install it and received errors similar to this:
Using cached google-re2-1.0.tar.gz (9.8 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: google-re2
Building wheel for google-re2 (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-39
copying re2.py -> build/lib.macosx-10.9-x86_64-cpython-39
running build_ext
building '_re2' extension
creating build/temp.macosx-10.9-x86_64-cpython-39
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/tejas/opt/anaconda3/envs/project_name/include -fPIC -O2 -isystem /Users/tejas/opt/anaconda3/envs/project_name/include -I/Users/tejas/opt/anaconda3/envs/project_name/include/python3.9 -c _re2.cc -o build/temp.macosx-10.9-x86_64-cpython-39/_re2.o -fvisibility=hidden
_re2.cc:11:10: fatal error: 'pybind11/pybind11.h' file not found
#include <pybind11/pybind11.h>
^~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command '/usr/bin/clang' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for google-re2
Running setup.py clean for google-re2
Failed to build google-re2
ERROR: Could not build wheels for google-re2, which is required to install pyproject.toml-based projects
I tried to find the problem on Airflow’s Github Issues, but God forbid anyone should end up there… The people there have no mercy when they reply… (I will leave a few issues links at the end of the article.)
Anyway, let’s install Airflow!! 🥳
Prerequisites
Before you begin, you will need to have the following prerequisites installed:
- Python 3.9 or higher (Tbh, I haven’t tried it with the higher versions but I don’t think it will be a problem — fingers crossed🤞🏻 — )
- Conda
Step 1: Create a new Conda Environment
conda create --name airflow_env python=3.9 -y
conda activate airflow_env
Step 2: Install Apache Airflow
Constraints URL is from the original Airflow Github repo.
pip install "apache-airflow==2.2.3" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.3/constraints-no-providers-3.9.txt"
Step 3: Setup Airflow Database and User
You need to initialize the Airflow database. This will create the tables that Airflow needs to store its data.
To initialize the Airflow database, run the following command:
airflow db init
cd ~/airflow
After these commands, we are supposed to see 4 files.
- airflow.cfg
- airflow.db
- logs
- webserver_config.py
In order to create an Airflow user, you need to create a database user and grant them the appropriate permissions.
To create an Airflow user, you can use the following command:
In the code below, you only need to change the first name, last name and email parts.
airflow users create --username admin --password admin --firstname your_name --last_name your_lastname --role Admin --email your_email@address.com
If you have not received any bugs so far, perfect! Now we can move on to test our Airflow server.
Step 4: Start Airflow Webserver and Scheduler
Now that you have initialized the Airflow database, you can start the Airflow webserver. The webserver is a web-based interface that you can use to manage your Airflow workflows.
To start the Airflow webserver, run the following command:
airflow webserver -D
airflow scheduler -D
The webserver will start on port 8080. You can access it by opening a web browser and navigating to http://localhost:8080 or http://0.0.0.0:8080. And Voila!!!
Conclusion
This article has given you a step-by-step guide on how to install Airflow on your local machine for Mac users. Once you have installed Airflow, you can start creating and managing your own workflows.
For more information on Airflow, please refer to the official documentation.
I hope this helps!
References
- https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html
- https://github.com/apache/airflow
- https://github.com/google/re2/issues/437
- https://github.com/apache/airflow/issues/32849
- https://github.com/apache/airflow/discussions/32852
- https://github.com/conda-forge/google-re2-feedstock/issues/6
- https://github.com/conda-forge/airflow-feedstock/issues/114