Data Structures - Numpy Arrays

ENV 859 - Geospatial Data Analytics   |   Fall 2024   |   Instructor: John Fay  

Introduction

Quite recently, a paper published in Nature revealed the role that the NumPy package played in various scientific discoveries, that NumPy “underpins almost every Python library that does scientific or numerical computation.” How did NumPy gain such importance? For one, it provides a scientific data structure - the N-dimensional array - that greatly facilitates a number of intricate calculations that Python’s native data formats are rather clumsy with. And second, NumPy is fast, built to optimize use of the computer’s CPU.

ESRI has embraced the importance of NumPy and has integrated it into its GIS software with tools designed specifically to convert feature classes, tables, and raster datasets to and from Numpy Array objects. (See this link…)

As an introduction to NumPy, we will return to Jupyter Notebooks. I’ve created a GitHub repository that you will fork (i.e., create your own copy) into your own GitHub space, clone it to your local machine. This repository contains a number of notebooks to explore both Numpy and Pandas.

A summary of what is covered is provided below as well as some links for additional information.


Exercise Set up

1. Fork and clone the lesson repository to your local machine

  • Go to https://github.com/ and sign in to your account

  • Navigate to this site: https://github.com/ENV859/ScientificComputing

  • In the upper right portion of the repository’s page, you’ll see a button to Fork the repository.

    What happens when you fork a repository?

    Forking a repository creates a copy of that repository in your account. You own the forked copy, and can treat it like your own. However, it also retains a link to the original repo so you can pull updates from that.

    A nice and complete description is provided here. That said, for now all you need to understand is that you are creating your own copy of the ScientificComputing repository and can start tracking your own changes to it, if you wish.

  • After you’ve forked the repository, you are taken to the GitHub page of your copy. Clone this repository to your local machine:

    • Open the Git Bash app on your local machine. It will open as a black console with a $ prompt.

    • At the prompt, type cd v: to switch to your mapped V: drive.

    • Type git clone https://github.com/<YOUR GIT USERNAME>/ScientificComputing, replacing <YOUR GIT USERNAME> with your actual GitHub username. (This is the path to your forked repository).

      Git Clone Image

    • You should now have a new folder named “ScientificComputing” in your V: drive this is a clone of your forked repository.

2. Fire up Jupyter Notebooks

  • In the cloned repository is a shortcut to open Jupyter Notebooks as well as a folder of notebooks and a folder containing data used in some notebooks.

NumPy

What is NumPy?

  • Provides a new data type - the array - which can greatly speed up certain computations

    • Example: BMI from height and weight lists (00-Intro-to-NumPy.ipynb)
    • Intro: A quick glimpse into NumPy’s ndarray data type (01-NumPy-101.ipynb)
  • Incorporated into ArcGIS now as it provides useful (and fast) tabular analysis

    • Example: NC HUCs (02-Numpy-with-FeatureClasses.ipynb)
  • Converting rasters to NumPy arrays also allows for analysis beyond ArcGIS/ArcPy

    • Example: DEM -> NumPy array -> Computing TPI (03-Using-NumPy-With-Rasters.ipynb)

More on NumPy

Overall…

  • Numpy is all about arrays, i.e., dimensional data
  • It offers an alternative to ArcGIS/ArcPy for working with feature layer tables and raster datasets and often times can be much faster than ArcGIS Pro or ArcPy.
  • Numpy is useful, but spend more time on Pandas…