Extending Python - Conda & Coding Environments
Introduction
We have now seen how Python can be extending by importing various modules and packages, but at present we only have access to Python’s built-in packages and those that have been pre-installed in our Python coding environment. Here, we explore how we can find additional packages and install them so that we can import them into our scripts.
Learning objectives
Topic | Learning Objectives |
---|---|
Python Packages | • Explain what “3rd party packages are” • Discover and navigate various Python package repositories |
What is Conda? | • Explain what Python distributions are, what purpose they serve, and what the include • Differentiate between Anaconda and Miniconda Python distributions • Explain the issue of conflicting packages in terms of package dependencies • Describe the role of package managers like Conda in addressing package conflicts via coding environments |
Using Conda in ArcGIS Pro: The Python Package Manager |
• Navigate to the Package Manager in ArcGIS Pro • Explain why we can’t alter the default arcgispro-py3 environment• Clone the default arcgispro-py3 environment• Set the cloned environment to be the active environment • Install and update packages in a cloned environment |
Command Line Conda | • Navigate to the Conda command line interface associated with ArcGIS Pro • Identify the active coding environment via the command line prompt • Issue various Conda commands to list, create, clone, activate, & delete environments • Issue various Conda commands to install & update packages • Describe the concepts of “channels” and how to install packages from specific repositories |
♦ Python Packages
As we’ve seen in the last session, importing packages into our Python session can add a great deal of new functionality to our coding environment. What we haven’t really seen yet, however, is the amazing wealth of Python packages that are available. In fact, this is perhaps one of the strongest reasons for learning Python: there’s a reasonable chance that someone has already developed a Python package that will help you achieve whatever coding task you are tackling in a more efficient and effective way.
So to tap into this amazing resource, we first need to know where to find these packages and then how to install them into our Python coding environment so that we can use them.
Package Repositories
• The Python Cheese Shop and the Python Package Index, or PyPI
Early on, Python had links to something called the Python Cheese Shop, a web repository for gobs of user contributed Python packages (and a nod to Monty Python’s famous Cheese Shop sketch). This repository has since adopted the name Python Package Index, or PyPI, and is accessible here: https://pypi.org.
- Search PyPi for “mget” - MGEL’s Marine Geospatial Ecology Tools have finally been converted to Python!
• Finding useful packages…
Back in the Cheese Shop days, the number of packages was low enough to be somewhat browse-able. As of the writing of this document, PyPi now hosts over a half-million packages (source), and this number grows faster and faster. So it’s hardly something you want to sift through to look for something useful. It does have search capability, or perhaps you want to navigate to https://pypistats.org/top and see what packages have the most downloads, but still that’s hit or miss. The bottom line, however, is that the number of packages out there is so large that it takes word of mouth, an observant eye, or just some craft web searching to find the useful ones – a lot like finding data. I’d say my best package discoveries have been made simply by looking at other people’s code and seeing what packages they import to execute tasks similar to what I am trying to do…
• Other repositories: Conda-Forge, GitHub sites – and a **caution**!
Conda-Forge (https://conda-forge.org/) is another package repository, also with mediocre search capability:
Try searching for
GIS
on their “Packages” page: https://conda-forge.org/feedstocks/, then selectQGIS
Open the QGIS link (https://github.com/conda-forge/qgis-feedstock) – a GitHub site hosting the Python package!
Many Python packages are hosted on GitHub (for good reason). And you may discover a Python package on GitHub without going through a repository as anyone can develop and host a Python package on GitHub.
And here comes my **caution**. Installing a Python package, which we’ll do in a moment, has the potential for installing some nefarious code on your machine, stuff that can open back doors, delete files, and other nasty stuff. This is quite rare, but still worth thinking about. What’s nice about downloading packages hosted on PyPi or Conda-Forge is that these have been vetted. There’s still risk, but it’s lessened.
And on to how we install these packages…
♦ What is Conda?
Conda is a Python package manager, meaning it facilitates installing packages so you can import them into your scripts. Conda comes along with Python if you’ve installed the Anaconda or Miniconda distributions of Python. Before answering exactly what a “Python package manager” or what “Conda” is, let’s reexamine the levels of Python installations, starting with how package managers are “installed”.
What are Python “Distributions”?
Python can be installed as a barebones, command line driven application to run code or scripts, but you’ll benefit greatly by installing a packaged “distribution” of Python. Enthought Canopy, Anaconda, and Miniconda are some of the more popular Python distributions. When installed, these install not only Python, but a number of useful packages, and a package manager. So that’s it: Python distributions are simply installations of Python that come with a few extras when loaded on your machine.
Is the version of Python installed with ArcGIS Pro a Python distribution?
Yes! ArcGIS Pro installs the Miniconda distribution of Python. It’s a bit tweaked, but it’s there and it includes this thing called Conda.
Miniconda vs. Anaconda distributions
The Miniconda and Anaconda Python distributions are two manifestations of the Conda package manager. The key difference is their respective interfaces. Miniconda uses commands typed at a command prompt to manage (install/uninstall) Python packages. Anaconda has a graphical interface. See this link for a thorough comparison. [ArcGIS Pro also adds a graphic user interface to Miniconda.]
So what is Conda?
Let’s answer this by first asking: Why do Python packages need “managing”?
First, package managers streamline the process of installing all the bits that comprise a Python package. Early on, packages were just additional Python scripts, i.e. text files with a .py
extension. We just needed to download the right files and put them in a place where Python could see them, and they’d probably work fine. Since then, packages have grown to be much more complex, requiring, for example, C++ libraries that need compiling to match the architecture of your machine. Installations needed to be exacting for all the pieces to work together. Package managers like Conda take care of downloading all the correct files and placing them in the right spots for the package to work.
Package managers also handle dependencies for a given package. Many Python packages are written to build of existing packages. For example, later on we’ll be using a package called GeoPandas which was developed on give another package, Pandas, geospatial capability. In code-speak, Pandas is a dependency of GeoPandas, and thus if one were to install GeoPandas, Pandas would get installed as well. Package managers would recognize that and install all dependencies for the package you ask it to install.
But it gets a bit more complex than that as package can get out of sync with its dependences: what happens to GeoPandas if the developers at Pandas decide to alter or deprecate a few key commands? This actually happens quite frequently, and the results can create havoc.
To overcome these discrepancies, Package managers are able to “version lock” specific installations. For example, if GeoPandas was written to work with Pandas version 2.1.1, then Conda could make sure that that specific version of Pandas is installed, potentially downgrading any existing version of Pandas!
Another benefit of package managers: Links to [reputable] Python package repositories
As alluded to above, package managers like Conda also have links to on-line repositories. When ask Conda to install a package, we don’t usually have to download it first; instead, Conda will search existing repositories for the necessary files suited for our particular machine and installation of Python, download them automatically, and install them! Furthermore, unless we specifically override some defaults, Conda will only pull packages from reputable sources, lessening the risk of installing any type of malware on our machine.
To sum up…
So you see, package managers provide a nice service! By installing the Miniconda or Anaconda distribution of Python, not only do we get Python on our machine (along with a set of frequently used or useful Python packages - like Jupyter - pre-installed), we get a robust package management system that allows us to easily add more packages to our coding environment.
But wait, there’s more!
But now you might be asking what if I need both the latest version of Pandas and the GeoPandas (which requires an older version of Pandas)?? The truth is that you can’t, but there is a nifty compromise in that you can have multiple coding environments.
What are “coding environments”?
Coding environments are separate virtual Python installations living on the same machine. They allow us to mix and match sets of Python packages so that we can overcome conflicts such as the one mentioned above. Or they can simply serve as coding “sandboxes” where we can testing installations without messing up a known working installations.
Conda also allows us to create these separate coding environments. But enough chatter! Let’s see how all this work!
♦ Using Conda in ArcGIS Pro: The Python Package Manager
First, we’ll see how Conda works from the ArcGIS Pro perspective, as it provides a gentle introduction. This is all done through ArcGIS Pro’s Python Package Manager, which has a nice interface to manage both environments and packages.
To access ArcGIS Pro’s Python Package Manager:
Open ArcGIS Pro. No need to open a new Project. Instead at the screen where you create a new or choose an existing project. Click the Settings button on the left side menu pane.
In the Settings pane, select Package Manager from the left hand menu. This will open your “Package Manager”.
Here you’ll see your currently active environment as well as all the packages that are installed with that environment.
Most likely, your default environment is
arcgispro-py3
and you’ll see a notice that “Conda cannot modify the default Python environment”. This is for good reason: ESRI keeps this environment locked so that you aren’t able to install, remove, or update packages that may conflict with the working version of ArcGIS Pro.
An now, we’ll run through a few exercises to get you familiar with the Python Package Manager.
Exercises:
1. Cloning an environment
Since we can’t modify the default environment, we’ll create our own and then change that.
-
Click on the gear icon () to the right of the Active Environment (upper right hand portion of the screen). This opens up the Environment Manager which lists all the available environments. You likely just have the one, which is fine.
-
In the upper right corner of this window are buttons to clone the default environment and to add an existing environment. Click the one to clone the existing environment.
-
Select the default location, but notice where it will be stored. Will you be able to access this environment from other machines??
-
It may take several minutes to clone the repository…
-
-
When the clone is complete, activate the new environment by clicking the ellipses icon (
...
) to the right of it, and selectingActivate
. You can then close the Environment Manager window.You can also activate the environment from the Package Manager window via the dropdown in the upper right corner.
Now when you close and restart ArcGIS Pro (on this same machine) it will use this Python environment. We can add more Python packages and they will be available to your ArcGIS session. We’ll see how this is done and where this comes in to play next!
2. Installing and updating packages
Now we have an environment that is identical to ESRI’s default, but that we can change. We’ll start by adding some packages, starting with Spyder, an alternate IDE to VS Code.
-
Select
Add Packages
in the Package Manager -
In the Search box, search for
spyder
, then click onspyder
beneath. Then clickInstall
. You will have to agree to terms and conditions, and then it may take several minutes to install.→ The package manager will install all dependencies for Spyder when in installs Spyder.
Where do packages get installed?
As Spyder is installing, open Windows Explorer and navigate to this folder:
%localappdata%\ESRI\conda\envs
. This is where the Python Package Manager creates new environments by default. (Note:%localappdata%
is a windows variable that points to a folder within your user folder.)
%localappdata%
is a shortcut to theC:\users\<your username>\AppData\Local
folder…- In this folder, you’ll see your new Python environment listed as a sub folder. Generally speaking, you should not mess with any files in this folder as it may corrupt your environment. But there are some useful files in here.
- Within your environment folder, you’ll see a subfolder called
Scripts
. Open that folder, and you’ll see theSpyder.exe
file which is used to start Spyder. You’ll also seejupyter-notebook.exe
which is used to start Jupyter notebooks.
That’s really all there is to it! It’s pretty intuitive to update and uninstall packages from this interface. (Addressing package installation issues and errors, however, is not nearly as intuitive…)
♦ Command line Conda
While ArcGIS Pro’s Python Package Manager provides a nice interface for managing Python environments and packages, it does not expose the full capability of Conda. Now we will look at the command line Conda interface to compare how it differs and to see what else we can do.
Knowing how to run Conda from the command line is also useful if you want to just install Python on your personal machine, i.e., without ArcGIS Pro. If you are interested in that, you’ll first need to install Miniconda on your machine, then these exercises should work!
The full documentation for managing environments using Conda is here:
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html
but we’ll do a few exercises to get us started.
Exercises:
1. Getting to the Conda command line
-
To access Conda if you have ArcGIS Pro installed, you open the Python Command Prompt from the Windows Start menu.
Note that in the command prompt window that appears the default environment is shown in parentheses at the front of the prompt. The above figure shows that I’m using the
arcgispro-py3-clone
environment.
2. Listing environments and listing installed packages
-
To list what environments have been created on your machine type the following Conda command:
conda info --envs
→ What environments to you see listed?
-
To list what packages are installed in the current environment:
conda list
.
3. Creating & cloning environments
The conda create
command is used to create a new or clone an existing environment. To create a new environment, we need to supply a name, done by adding --name
followed by the name you want to give your environment. To clone an existing environment, we again need to supply a name, but also specify which environment we want to clone using the --clone
subcommand.
- Create a new environment named “my_new_env”:
conda create --name my_new_env
-
Create a new environment that points to a specific version of Python (here 3.9):
conda create --name my_py39_env python=3.9
-
Clone our “arcgispro-py3” environment to “my_gis_env”:
conda create --name my_gis_env --clone arcgispro-py3 --pinned
The “pinned” parameter locks existing packages so that they can’t be updated (thus preventing conflicts with required packages).
- For more help on the “conda create” command, type:
conda create --help
4. Activating environments
Once an environment has been created, you need to “activate” it to use it.
- Activate our new “my_new_env” environment by typing:
activate my_new_env
→ Notice that the prompt now indicates that you are using the environment.
5. Installing packages
With our “my_new_env” now active, we can install packages to that environment.
-
Install “pandas” in the new environment:
conda install pandas
-
Now install two more packages (“requests” and “jupyter”) :
conda install requests jupyter
-
And finally, install “geopandas”:
conda install geopandas
Note the adjustments Conda makes to allow geopandas to install – this may be where conflicts arise!
Channels
By default, Conda retrieves packages from remote servers known as “channels”. When ArcGIS Pro is installed, it configures a default set of these channels, but in some cases we want to install packages hosted on other channels. You’ll often know this is the case if you search for the package on line and its homepage will indicate the Conda command to install it, including the channel.
For example, installing the “arcgis” package requires reference to the “esri” channel, so the command would be one of:
conda install --channel esri arcgis
~ or~
conda install -c esri arcgis
~ or ~
conda install esri::arcgis
More info on Conda channels is here.
6. Exporting and importing environments
Environment configuration can be written to a text file, often assigned a .yml
extension. This text file can then be used to construct an identical environment on another machine.
-
conda list --explicit > V:\myenv.yml
- This will create a new file on your V: drive (or whatever path you specify in the command) that contains all the info to rebuild this environment on a separate machine. -
conda create --name my_other_env --file V:\myenv.yml
- This will create a new environment using the settings saved in that “myenv.yml” file.
7. Removing environments
When you want to tidy up:
-
To remove an environment:
conda env remove --name my_other_env
Conda vs Pip
PIP is an alternative to Conda for installing Python packages. Here is a link to an in-depth discussion about how the two are similar and different. It gets a tad confusing, but here’s the takeaway:
- It’s best to stick with one method or the other, and since ArcGIS uses Conda, that’s our first choice (in this class, at least).
- Sometimes, however, Conda simply can’t install a package successfully; that’s when you see if PIP can help where Conda can’t. Doing so, however, may corrupt your virtual environment. You’ll need to delete the corrupt environment and rebuild it.
Recap
Python packages are an amazing resource, greatly facilitating countless coding tasks. But managing these packages requires a bit of know how. If you have ArcGIS Pro installed, you can use its Python Package Manager to help, but investing a little time to learn Conda commands can be a real asset as it allows you to create and modify new environments quite efficiently.
Bonus: Installing QGIS
Some of you have indicated an interest in using QGIS, here’s how you can install it on your ArcGIS Pro (or miniconda enabled) machine!
Run the following from your ArcGIS Command Prompt:
- Create a new environment, forcing install of Python v 3.9:
conda create --name qgis_env python=3.9
- Activate the new environment:
activate qgis_env
- Install QGIS:
conda install -c conda-forge qgis
- When complete, open QGIS by typing
qgis
at your command prompt. (Alternatively, you should also see QGIS in your Windows Start menu!)
If you run into lengthy pauses with the message “Solving environment:.”, you may instead want to switch over to using “mamba” to install your packages. Over time, Conda has - by their own admission - become quite inefficient in solving package conflicts. More and more people are moving to mamba, and its far more efficient algorithm to solve dependencies, to install complex packages.
Using “Mamba” (after creating and activating the
qgis_env
):
conda install conda-forge:mamba
mamba install -c conda-forge qgis
This may or may not work - an unfortunate artifact that the package landscape is evolving so fast, compounded by the fact that the Miniconda installation associated with ArcGIS Pro is typically well behind the latest version.