#Git #version_control #order #time #content_addressable #hyperlink
[[Git]] is a [[DVCS|distributed version control system]]([[DVCS]]) that allows developers to track changes in their codebase and collaborate with others. It is commonly used in software development projects to manage source code and ensure efficient collaboration among team members. A similar tool called: [[fossil]] invented by Dr. [[Richard Hipp]] who created [[SQLite]] should also be investigated.
In the context of Data Asset Management (DAM), Git can be utilized to manage data assets along with the associated code. This means that not only can developers track changes made to the code, but they can also track changes made to the data files. By storing and versioning data assets in Git repositories, teams can ensure that everyone is working with the same dataset and easily roll back to previous versions if needed. This becomes particularly useful in scenarios where data pipelines or machine learning models are involved, as it allows for better reproducibility and traceability of results.
Regarding [[DevOps]], Git plays a crucial role in facilitating the [[CICD|continuous integration and continuous delivery]] (CI/CD) practices. [[CICD|CI/CD]] is an essential part of DevOps methodology that emphasizes frequent code integration, testing, and deployment. Git enables developers to maintain different branches for separate features or bug fixes, allowing them to work on isolated tasks without disrupting the main development branch. These branches can then be merged back into the main branch through pull requests after thorough review and testing.
Moreover, Git integrates seamlessly with various CI/CD tools like Jenkins, Travis CI, or CircleCI. These tools automate build processes, run tests, and deploy applications based on changes detected in the Git repositories. By leveraging Git's version control capabilities within a DevOps workflow, teams can achieve faster iterations, improved collaboration between developers and operations teams, and ultimately deliver higher-quality software products.
See [[Learn Git with Category Theory|Learn Git with Category Theory: a course in ABC curriculum]]
## Git compresses data to reduce storage sizes
Git uses symbolic links and file compression techniques to reduce storage sizes in the following ways:
[[Symbolic Link|Symbolic Links]]:
1. Git uses symbolic links to store references to files that have not changed between commits. Instead of creating a complete duplicate of the file, Git creates a link pointing to the previous version of the file.
2. When a file is modified, Git only stores the changes made to that file instead of creating an entirely new copy. This reduces duplication and saves storage space.
File Compression:
1. Git applies compression algorithms on its objects, including blobs (file contents) and trees (directory structure).
2. Git uses the [[zlib]] library for compression, which is a widely used and efficient compression algorithm.
3. The compression process reduces the size of files by eliminating redundant data and encoding them in a more compact form.
4. Compression also ensures that even if multiple versions of a file exist in different commits, they are stored efficiently by sharing common parts between them.
5. Git automatically compresses objects during storage and decompresses them when they are accessed.
By utilizing symbolic links and efficient file compression techniques, Git minimizes storage requirements by storing only the necessary changes and compressing data effectively. This allows for efficient version control without consuming excessive disk space.
## Who invented GIt?
Git was invented by [[Linus Torvalds]], the same person who created the Linux operating system. He developed Git in 2005 as a distributed version control system to manage the Linux kernel development process.
In summary, Git serves as an integral tool for both [[Data Asset Management]] and DevOps practices. It enables efficient version control of both code and data assets in DAM scenarios while supporting collaborative development workflows and facilitating CI/CD processes in DevOps environments. It is also an ideal platform to learn and practice [[Category theory]].
# How to initiate a Git Project on Github?
### 1. Create a Repository on GitHub:
- Log in to your GitHub account.
- Click the "New" or "Create" repository button.
- Give your repository a descriptive name.
- Choose whether you want to keep it public or private.
- Optionally, add a description and initialize with a README file.
- Click "Create repository."
[Opens in a new window](https://docs.github.com/en/github-ae@latest/repositories/creating-and-managing-repositories/quickstart-for-repositories)[docs.github.com](https://docs.github.com/en/github-ae@latest/repositories/creating-and-managing-repositories/quickstart-for-repositories)
creating a new repository on GitHub
### 2. Create a Local Git Repository:
- Open a terminal or command prompt on your computer.
- Navigate to the directory where you want to create your project.
- Type `git init` and press Enter. This creates a hidden .git folder to track your project's files.
### 3. Add Files to Your Local Repository:
- Create the files and folders you want to include in your project.
- Use `git add .` to add all files in the current directory to the staging area.
- You can also add specific files or folders using `git add <file/folder>`.
### 4. Commit Your Changes:
- Type `git commit -m "Initial commit"` to commit the changes to your local repository.
- Replace "Initial commit" with a descriptive message about the changes you made.
### 5. Link Your Local Repository to GitHub:
- Copy the remote repository URL from GitHub.
- Type `git remote add origin <remote_repository_URL>` in your terminal.
- Replace `<remote_repository_URL>` with the actual URL you copied.
### 6. Push Your Changes to GitHub:
- Type `git push origin main` to push your local commits to the main branch on GitHub.
- You may need to enter your GitHub username and password if prompted.
### 7. Check Your Repository on GitHub:
- Visit your repository on GitHub to see the uploaded files and commit history.
- You can now collaborate with others, track changes, and manage your project effectively using Git and GitHub.
**Additional Tips:**
- Use `git status` to check the status of your files and changes.
- Use `git branch` to create and manage different branches for your project.
- Use `git pull` to fetch and merge changes from remote repositories.
- Explore Git's extensive features and commands for more advanced version control workflows.
**Image Breakdown:**
- **:
[Opens in a new window](https://launchschool.com/books/git/read/github)[launchschool.com](https://launchschool.com/books/git/read/github)
creating a new repository on GitHub
**This image shows the visual interface of GitHub's repository creation page, where you'll provide details like the repository name, visibility, and initial settings.
# How to Shallow Clone a large Git Repo?
There are two main ways to perform a shallow clone in a Git repository:
**1. Using the `--depth` flag:**
This is the most common and straightforward way to shallow clone. Add the `--depth` flag to your `git clone` command followed by the desired depth (number of commits back from the current HEAD). For example, to clone only the last 100 commits:
```
git clone --depth 100
[email protected]:username/repo.git
```
**2. Using the `--shallow-submodules` flag:**
This method is specific to repositories with submodules (nested Git repositories). It clones the root repository to a specified depth while leaving the submodules in a detached HEAD state. This can be useful if you only need the top-level code and don't want to download the entire history of all submodules.
```
git clone --depth 1 --shallow-submodules
[email protected]:username/repo.git
```
Here are some additional things to keep in mind when shallow cloning:
- Shallow clones download only the specified number of commits and their associated objects. This can significantly reduce the download size and cloning time, especially for large repositories.
- However, you won't have access to the full history of the repository. This means you cannot perform operations like fetching older commits or reverting to previous versions.
- You can always deepen a shallow clone later by using the `git fetch` command with the `--depth` flag again.
# Advanced Visualization in Git with color
See [[watch#Using watch with git#Advanced Visualization]]
## Tips on Obsidian
When working with Obsidian Git, it is important to avoid automatic commits. It can be done by setting the default cycle time for updates to 0.
# Important Global Configuration for pull --rebase
To set `git pull --rebase` as your default behavior, use the following command in your terminal: (See [The Youtube link](https://www.youtube.com/shorts/gMZ8IWBo-Hc) by [Nurlan Valizada](https://www.youtube.com/channel/UCBg6sAU9CJVhyNvVdSJ6CVA))
```bash
git config --global pull.rebase true
```
This tells Git to always use rebasing instead of merging when you pull changes from a remote repository.
**Why use `git pull --rebase`?**
1. **Cleaner commit history:** Rebasing avoids creating unnecessary merge commits, resulting in a more linear and easier-to-follow project history.
2. **Easier to resolve conflicts:** Rebasing applies your local commits on top of the latest remote changes, making conflicts easier to identify and resolve individually.
3. **Simplified collaboration:** When working in teams, rebasing helps avoid complex merge conflicts and keeps the history clean for everyone involved.
4. **Mimics local development:** Rebasing essentially simulates the process of working on a local branch and then applying your changes on top of the latest version, ensuring your code is always up-to-date.
**Additional tips:**
- If you want to preserve merge commits while rebasing, use `git config --global pull.rebase merges`.
- To enable automatic conflict resolution during rebase, use `git config --global rebase.autoStash true`.
- You can always override the global setting by using `git pull --no-rebase` for a specific pull.
**Caution:**
- Rebasing can rewrite commit history, so it's generally not recommended for shared branches that others are working on.
- Be careful when force-pushing rebased branches, as it can overwrite the remote history for others.
By setting `git pull --rebase` as your default behavior, you can significantly improve your Git workflow, making it more efficient and less prone to conflicts.
# The Complexity of Git
The best way to describe git can be shown as follows:
![[@GitAsHomeomorphicEndofunctors#The Mathematical Model of Git Branch]]
# References
```dataview
Table title as Title, authors as Authors
where contains(subject, "Git")
```
# Notes
[[@geekhourLiangXiaoShiGitRuMenJiaoCheng2023|两小时Git入门教程]] by [[GeekHour]]
![[@HowGitWorks2023]]
![[@d-i-ryGitGraphEasiest2021|Git Graph: the easiest way to start using git]]