# Overview Git is a tool for version control, which means it tracks changes made to a set of computer file, maintaining a history of the files and making it easier for teams to maintain a single code base without introducing conflicting code. A directory of files managed by Git is called a [**repository**](https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository). After it is initialized, changes made to the files in a repository are automatically detected, and can be **staged**, marking them changes to be added to the repository history, and [**committed**](https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository), which actually adds them. Commits are made to a [**branch**](https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-Nutshell), which is a snapshot of a set of commits that can be [**merged**](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging) together. When team members each commit to different branches and merge them together only after proper testing and review, it makes it easier to avoid breaking changes and conflicts. ![[Pasted image 20230914100249.png]] # Typical workflow First, initialize the repository, and use `git status` to check which branch you are currently on and what changes are detected ```bash cd my_repository/ git init git status ``` Create a new branch and check it out. Using `git status` after should show this as the new active branch. ```bash git checkout -b new_branch ``` Make changes to the files in the project, stage them, and commit them to the branch ```bash # make a change touch new_file.txt # changes are detected, but unstaged git status # Stage the file git add new_file.txt # make the commit git commit -m "added new_file.txt" ``` Now, merge back into the main branch ```bash # Get the branch we started on git checkout main git merge new_branch ``` Ideally, the merge goes through without issue. However, occasionally there are **merge conflicts**, meaning that the changes made to one branch conflict with other recent changes. When a marge conflict happens, they will need to be resolved by manually selecting which changes are the "real" ones. Check the [documentation](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging) for how to resolve conflicts. # Tips and best practices - Files that you do not want to ever track can be ignored by adding their name to a file called `.gitignore` in the repository - Git tracks changes by line. In general, this means that the only files that should be committed are those with "lines", such as raw text or code. Binary files, like images or pdfs, don't have lines, and so every time a commit is made, it is not just a few lines that are edited, but the entire image is replaced. The result is that (a) the Git history is not very informative, and (b) it takes more storage, as multiple copies of (often large) binary files are stored. - Jupyter notebooks change with every execution, so committing them can really lead to inflated changes. Be sure to clear notebook outputs for each commit. - Branches and commits should be made frequently, for even (or *especially*) very small changes. Doing so makes it easier to isolate issues and revert. - The `main`/`master` branch should always contain production-level code. That is, it should always work. Any changes should first be made to a branch, and only be merged into the main branch after proper review. - Messages should be short, but descriptive. This is because messages will help to identify specific changes in the repository history. # Resources - [Main git documentation](https://git-scm.com/) - [Learn on CodeAcademy](https://www.codecademy.com/learn/learn-git) - [Fun visual tutorial of Git commits and branching](https://learngitbranching.js.org/?locale=en_US) - [Atlassian's tutorial on Git](https://www.atlassian.com/git) - [Git best practices from GitLab](https://about.gitlab.com/topics/version-control/version-control-best-practices/) # Trivia - Git was created by Linus Torvalds to help manage development of the Linux kernal -