Intro to Git and GitHub

ENV 859 - Geospatial Data Analytics   |   Fall 2023   |   Instructor: John Fay  

Introduction

I don’t know anyone who can just sit down and write a Python script from start to finish in one go. IDEs like VSCode help, but errors sneak in and often you discover a flaw in your coding logic that requires some retooling. This is where version control technology comes to your assistance. You may have heard of Git and/or GitHub in this context - they are some heavy hitters in this category - and here we dig into what these technologies are and how to leverage them in your coding adventures.

Prior to doing this lesson you’ll need a [free] GitHub account. Please log into https://github.com and create one prior to continuing… Note that as students, you are eligible for the GitHub Student Developer Pack which has some cool benefits. Be sure to check it out!

You will also need to have Git installed on your local desktop. This can be downloaded here: https://git-scm.com/downloads

Learning objectives

On completion of this lesson, you should be able to:

  • Explain the utility of version control in coding projects
  • Describe, at least in general terms, the Git versioning workflow
  • Initialize your VSCode workspace as a Git-enabled workspace
  • Add, stage, and commit changes to the local repository
  • Authenticate your local machine to work with GitHub
  • Clone a workspace to your local macine

1. Diving In

In this set of steps, we are going to configure our VS Code Turtle Tracking workspace to use Git and GitHub. There’s a lot to unpack here; I find it’s best to first walk through steps involved in using these technologies and then explain what we did. So let’s dive in.

1.1 Initializing our local workspace as a Git repository

  • Open your Turtle Tracking Project in VS Code.

  • On the left hand Activity Bar, click the Source Control icon (or type <ctrl-shift-G>).

  • In the Source Control panel, you’ll be offered two options: Initialize Repository and Publish to GitHub. Select Initialize Repository.

    This action enables version control for your entire workspace, meaning the Git software installed on your local machine will be observing all the files in the project folder. It will track files you tell it to track, with all the tracked information kept in a newly created folder called .git in your project folder. (This folder may be hidden from view, depending on your Windows settings.) You should never disturb the contents of this folder in Windows.

    :point_right: In version control lingo, your project workspace is now a Git repository.

  • The Source Control panel now displays the version status of your workspace. You should see the three files you created in the last exercise with a green U under a collapsible menu named “Changes”.

    :point_right: The green U indicates the file is untracked, meaning Git sees it, but is not tracking any changes you make to it.

1.2 Staging and committing changes to our local Git repository

  • Click the :heavy_plus_sign: that appears when you hover your mouse over the ReadMe.txt listing in the Source Control pane.

  • The file moves to the Staged Changes collapsible menu and the U becomes an A. This means the file has been “added to the staging area”, or more succinctly it has been “staged”.

    :point_right: A file that is “put in the staging area”, i.e., that has been “staged”, will be included in the next commit

  • In the box labeled Message, type the following message: “Initial commit of README.txt file”.

  • Click the Commit button.

    • You may be asked to configure your “user.name” and “user.email” in git

    The preceding two steps commits the Readme.txt file to Git’s versioning memory. The file is essentially saved in two locations: on our file system and in Git’s database. If we were to delete it from the project folder, we would recover it from Git’s database. Or we were to change the ReadMe.txt file and save it, we could revert those changes by recalling the version saved in Git’s database. The message we typed allows us to find the exact entry in Git’s database where we added (or modified) this file.

► Exercise: Stage and commit the data and metadata files…

  • Stage both the Sara.txt and the Sara_README.txt (or whatever you named your metadata file).
  • Commit these two files, typing the message “Initial commit of the tracking file and metadata”.

1.3 Configuring Git to work with GitHub

Before continuing, we need to configure Git to know who we are. This allows Git, software installed on our local machine, to converse with GitHub, a cloud service. Before doing this, you’ll have to know your username and email address used when creating your GitHub account.

  • From the VSCode Terminal menu, select New Terminal to open up a new terminal prompt in your session.

  • At the terminal prompt, type:

    git config --global user.name  <your GitHub username>
    

    replacing <your GitHub username> with your actual GitHub username, and hit Enter.

  • Then type:

    git config --global user.email  <your GitHub email>
    

    replacing <your GitHub email> with the email address associated with your GitHub account, and hit Enter.

Now Git has enough information to locate your GitHub account. You can close the terminal.

1.4 Publishing your local Git repository to GitHub

  • Click the button labeled Publish Branch.
  • You will likely be asked to allow VSCode to sign in using GitHub. Allow this, and follow the instructions to log in via a web browser to authenticate your VSCode-Git-GitHub link.
  • Select to publish your local Git repository as a public GitHub repository. You may again be asked to authenticate your local Git account with GitHub via a browser interface.
  • Open your GitHub page and navigate to your repositories. You should see a remote copy of your workspace!

So what just happened?

We now have a version controlled repository on our local machine with three files committed to it.

Soon, we will be adding new files to our coding workspace and editing them: script files, perhaps more raw data files, processed data files, etc. Each time we add a file or make changes to existing ones and we want to ensure those additions/edits never get lost, we stage them and commit them to our local Git repository, adding a message that allows us (“our future selves”) to identify what we did when so that if we want to either undo old changes or wipe all changes back to a certain point, we can do that. So now we can boldly write code without having to save files as “Change1.py”, “ImprovedChange1.py”, “FixedImprovedChanges1.py”. Instead, we’ll just have our working Python script, but one where we can revert back to various versions.

Not only are all these changes recorded and saved in our local Git repository, they are also replicated in the cloud, on GitHub.com. This has several uses. First, having our work on the cloud means we can pull our repository to any machine with the proper software installed and continue working. If our local hard drive dies, we will have a backup! But also, hosting our code on GitHub.com allows others to view our work (if we want), which is a cornerstone of the open source school of coding.


What now?

There’s still a bit to learn about what Git and GitHub are and how they are used. However, now that we’ve had some exposure, I’m hopeful the concepts will settle in more quickly. Let’s now step back and examine these concepts, starting with terminology.

Terminology

Version control, also called “source control”, is used to track and store changes in your files without losing the history of your past changes. Picture yourself writing a paper in Microsoft Word: you continuously make edits to your document and save them, typically to the same file. But what if you wanted to undo the edits you did two saves ago? You can’t. Version control allows you to do so by saving every set of changes you made to the document (or sets of documents.) It can be a game changer when it comes to coding.

Version control has a particular workflow and many moving parts, both of which combine to make for a somewhat steep initial learning curve. However, through practice and some trial and error, you should have sufficient command over the technology to make it work for you. We’ll begin with some basic vocabulary that will make more sense when put into context when we run through some examples.

♦ Basic vocabulary of version control

There are more terms than these in the world of version control, but these are enough to get you started without overly confusing issues.

  • Repository: a location where all the files for a particular project are stored, usually abbreviated to “repo.” You will likely maintain both at least one local repository (on the machine(s) where you are writing your code) and a remote repository (that lives in the cloud and is periodically synchronized with your local repository).
  • Clone: Cloning is the process of creating a new local repository from a remote one.
  • Staging: Staging a file adds it to the set of files that you are going to log as a single commit in your local repository. You can stage one more many files in each commit.
  • Commit & Commit message: A commit is a record of changes or additions to files that have been marked as “staged”. The record of the commit is stored in the local Git database and is referenced by an internal identifier (called a “SHA”) as well as a commit message that you provide. Git provides a full history of all the commits you have made, and you can undo a commit (which actually creates a new commit that reverses the previous commit). You can also reset your entire repository to a specific commit, deleting all changes since that commit.
  • Pushing/Pulling: These refer to actions used to synchronize the remote and the local repositories. Changes in the local repo are pushed to the remote, and changes in a remote repo are pulled to the local repository. (The remote repository is often named “upstream” to cement this notion.)

♦ What are Git and GitHub?

Git and GitHub are technologies that facilitate versioning. They actually do a lot more than that, but let’s focus on the versioning bit, beginning with the difference between Git and GitHub:

Git is software installed on your local machine that enables version control. The application itself can be used via a command line shell (“Git Bash”), but many applications such as VS Code include graphic interfaces that can issue Git commands. However, it’s the Git software that provides the actual engine for managing the Git versioning databases within your local repositories. Git has been installed on all NSOE machines, but if you need to install it on another machine, you can download it at: https://git-scm.com/

GitHub is a cloud resource where the remote repositories are kept. Individuals have their own account on GitHub and within this account you maintain your repositories. These repos can be public (anyone can see and pull from it) or private (only you can see and pull from it). You can maintain exclusive control over items pushed to this repository, or you can add collaborators who can also push to it.

:point_right: GitHub is not only a valuable resource for maintaining your own versioned repositories, it also allows coders everywhere to share their work so that others can build off of it. Check out ESRI’s GitHub page, https://github.com/Esri, and explore the wealth of materials accessible there.

:point_right: More about Git, GitHub, and VS Code can be found here: https://code.visualstudio.com/docs/sourcecontrol/overview. There’s a lot to take in there, too much for right now probably. However, I recommend you review the page to see what’s on there, and come back to it from time to time to get deeper insight into what we cover in class.


More Git/GitHub workflows

Cloning a GitHub workspace

Cloning is the process of creating a local Git repository from a remote repository located on GitHub. This could be useful if you moved to a new machine and wanted to continue working on a coding project you already started and pushed to GitHub. You can actually clone any workspace you can see on GitHub, but you can only push changes to a GitHub repository that lives in your GitHub account or to a repository to which someone has added you as a collaborator.

Here are the steps to clone a repository within VS Code.

  • Open VS Code.
  • Open the Command Palette (View>Command Palette or <ctrl>-<shift>-P).
  • Enter the command Git: Clone and select Clone from GitHub
    • Follow the instructions to log into GitHub, if asked
  • You will be presented a list of repositories in your account; select the one you want to clone
    • Alternative, you can simply paste the URL of the repository you want to clone
  • Set the folder where the repository should go on your local machine, e.g. the V: drive.

When complete, the remote repository should be on your machine, fully connected with Git!