Using python -m to invoke modules as scripts#

Before you read…

This guide requires some pre-requisite knowledge of using Python. If you can answer the following questions with at least some level of confidence, you can continue ahead:

  • What is a terminal? What can you use it for?

  • What is a current working directory?

  • How do you run Python scripts (.py files) from the terminal?

  • How do you make a Python script import another script?

  • How do you install third-party packages with pip? How do you use them?

You might have seen the -m flag used in various python commands online, or was told by someone else to use python -m to “run a module as a script”, but didn’t really understand what that meant. This gist covers how that flag is used, particularly in the context of package development.

Introduction to modules and packages#

Say you wanted to write a command-line utility for downloading a GitHub repository release, and you started out with a single script, downloader.py. Eventually your code started growing too big to be well-organized in a single file, so you decide to split its functionality into separate modules:

api.py
cache.py
cli.py
main.py
    from api import download
    from cli import parser

    args = parser.parse_args()
    download(args.repository, args.filename)

If you wanted to share this with other users or re-use it in another project, they would need to download all four scripts inside whatever working directory they might be in, as well as any dependencies required by your script:

my_project/
└── api.py, cache.py, cli.py, main.py
    # /my_project $ pip install requests
    # /my_project $ python main.py ...

This is a fairly inconvenient process to do. A nicer way to handle this would be packaging and uploading the code onto PyPI so that users can install it with a single command:

pip install my-github-downloader
python -m my_downloader

If you want to do the same thing, the first step you should do is organize your code into a package, where you’ve collected your scripts into a single directory:

my_project/
└── my_downloader/
    ├── __init__.py
    ├── api.py, cache.py, cli.py
    └── main.py
            # In packages you can use relative imports:
            from .api import download
            # Though absolute imports are also valid:
            from my_downloader.cli import parser

This way, all of your tool’s scripts are contained in one directory and is easier to distribute to other systems.

Note

Wait, why would I upload my application onto PyPI? Isn’t it only for libraries? What if I want to keep my app private?

Packages don’t have to be limited to just libraries that users import. PyPI is an easy way to distribute code to users, and that includes applications too! black, mypy, memray, and pip itself are applications distributed through the Python Package Index. Packages can also be a library and application at the same time, like pytest.

Of course, PyPI is a public index, and anyone will see your package. Maybe you want to keep it private or you don’t think it needs to be on PyPI. In which case, you can still distribute and install your packages in other ways, such as from version control systems or from local projects.

How does -m play into this?#

Now that your code is organized as a package, how do you run main.py? You could try to do python my_downloader/main.py, but this makes Python run main.py as a standalone script, without knowledge of the package layout it resides in. As such, you lose features of packages like __init__.py and relative imports:

/my_project $ python my_downloader/main.py
Traceback (most recent call last):
File "/my_project/my_downloader/main.py", line 2, in <module>
    from .api import download
ImportError: attempted relative import with no known parent package

To run a module inside a package, you should use the -m option like so:

/my_project $ python -m my_downloader.main

This essentially imports the module described by the path my_downloader.main, and sets its __name__ constant to "__main__". As a result, the my_downloader package goes through the entire import system, executing __init__.py and setting up the context for . relative imports, allowing main.py to run as intended.

…don’t understand how importing works here? Don’t worry, I’ll cover this in a bit, but before that I want to mention using __main__.py.

Using __main__.py#

Packages support another special script, __main__.py. When this is present in a package, the -m option will implicitly run that script when its given the name of a package instead of a .py module. We can take advantage of this to make my_downloader invokable by renaming main.py to __main__.py:

my_project/
└── my_downloader/
    ├── __init__.py
    ├── __main__.py  # contents of main.py
    └── ...
/my_project $ python -m my_downloader
# Equivalent to typing the full module path:
/my_project $ python -m my_downloader.__main__

Simple, right? Now, let’s cover imports.

What does importing a module really mean?#

Note

In case you’re lost about the script / module / package terminology, let’s assume that (1) a script is a .py file you can run with python script.py, (2) a module is something you can import, and (3) a package is a specific kind of module consisting of a directory with an __init__.py. This will be sufficient for the following discussion.

You might have the understanding that scripts can import other scripts as modules alongside the ones you install with pip, and then access functions and classes from them. This mental model is generally correct. However, you may have made some assumptions about how modules are found.

When running python -m my_downloader, how does Python know where to find this my_downloader module? You might assume it always looks in the current working directory, but this isn’t true all the time. The exact answer is sys.path, a list of directories that Python searches when resolving imports. The use of -m in python -m path.to.mod makes Python prepend your current working directory to sys.path, unlike say, python path/to/main.py which prepends the script’s directory, path/to/ instead of your CWD.

All absolute imports rely on sys.path.

How an import like import matplotlib gets resolved in main.py is no different from how it gets resolved in seaborn/__init__.py. What changes is the directories listed in sys.path, mainly based on your environment variables and how you run the Python interpreter.

Take for example the following layout:

CWD/
├── pkg/
   ├── __init__.py
   ├── foo.py
   └── bar.py
└── main.py

It’s a common mistake to think that because pkg/foo.py and pkg/bar.py are next to each other, both of them can do import foo or import bar, since it really depends on whether their parent directory is in sys.path. If you were to run python main.py or python -m pkg.foo, CWD/ would be in sys.path rather than pkg/ itself, meaning Python can only resolve import pkg. Therefore to import either submodule, it must be fully qualified as import pkg.foo and import pkg.bar.

Hint

If you recall how relative imports are written, this is where you might use them over absolute imports!

from . import foo
from . import bar
from .foo import ham, spam

Now you don’t have to fully qualify the import because Python assumes that your relative imports start from each module’s parent package, pkg. In other words, the above relative imports become equivalent to the following absolute imports:

from pkg import foo
from pkg import bar
from pkg.foo import ham, spam

Unfortunately relative imports can’t be used outside of submodules so you wouldn’t be permitted to say, write from .pkg import foo inside main.py [1], or try to import modules beyond the top-level package like from .. import mod [2].

That’s why for local projects, it’s important to organize and run your scripts in a consistent manner. For example, you might put modules and scripts in the same directory and then run your scripts with python path/to/script.py:

my_project/
└── app/
    ├── layouts/
       ├── __init__.py
       └── ...
    ├── parser/
       ├── __init__.py
       └── ...
    ├── compile.py
           from layouts import create_layout
           from parser import Body, Footer, Header
    ├── generate.py
    └── validate.py
/my_project $ python app/generate.py
/my_project $ python app/validate.py
/my_project $ python app/compile.py

Or you might organize all of your scripts into a package and use python -m package.submodule:

my_project/
└── my_package/
    ├── sub_package/
       └── __init__.py
               from my_package import submodule
    ├── __init__.py
           from . import sub_package
    ├── __main__.py
    ├── migrate.py
    └── submodule.py
/my_project $ python -m my_package --help
/my_project $ python -m my_package.migrate --input foo.csv --input bar.csv

However you organize your scripts, the one thing I recommend is setting your project root as the current working directory. cd ing around to run different scripts for one project is cumbersome, can unintentionally change your sys.path, and can be confusing for other users which have to contend with the same file structure and might assume by default that your project root is where they should run your commands from. However, if you think your way makes your project structure easier to work with, feel free to stick to it! As long as you document it for others (and perhaps your future self). But what if you want to use your module from anywhere in your terminal? Well now…

Permanently adding modules to sys.path#

Remember, python -m my_downloader worked in the previous examples because the current directory was /my_project and -m added it to sys.path. If you were to change to another directory, my_downloader would no longer be resolvable. This is one of the reasons why we have pip - it lets us install packages to a common place, site-packages/, that Python always knows to search for modules [3] regardless of our current working directory. However we’re not there yet, as pip can’t just install any plain old package. It needs to be packaged into a distribution that pip knows how to install. For this, I recommend looking into setuptools + pyproject.toml for writing your build configuration. Here’s the bare minimum you need to make a distribution package:

my_project/
├── my_downloader/
   ├── __init__.py
   └── ...
└── pyproject.toml
pyproject.toml#
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "my-github-downloader"
version = "0.1.0"

There are several other keys that can be written in the [project] table, but those two are the only required ones.

Note

See how we didn’t say anything about my_downloader/ in pyproject.toml? This takes advantage of setuptools’s automatic discovery to include the my_downloader/ package in the distribution. This won’t work with all layouts, and other build systems like Hatch and Poetry handle package discovery differently.

With pyproject.toml created, you can tell pip to find it in your project root and install your distribution:

/my_project $ pip install .

And now you can use python -m my_downloader and import my_downloader anywhere you want, if you wanted to import it in your other scripts!

Tip

You can also install your project in editable mode:

/my_project $ pip install --editable .

This removes the need to re-install your package every time you make changes to it. For avoiding certain side effects, this mode is best used with src-layout.