# Overview
I aim to integrate [[Open Science]] best practices directly into my workflow, which is not just about making commitments to open science, but also making it easier on ourselves when we need to re-learn and make changes to old projects, or to make future changes that require re-generating all results. Here, I outline the tools and approaches I use to organize information about projects.
# Literature
- I manage academic literature through [Zotero](https://www.zotero.org/) with the appropriate [browser connector](https://www.zotero.org/download/connectors) for quickly grabbing paper metadata
- I jot down nots on the pdf/paper itself, but after reading an article I try to extract notes and summarize the article as a page in my [[Obsidian]] repository. Tags are added to each note and they are roughly organized into folders by general subject area.
- For reading books, I strive (yet often fail) to follow Paul Edwards' guide ["How to Read a Book"](https://pne.people.si.umich.edu/PDF/howtoread.pdf).
# Data management
- Most files I store in Dropbox, which I pay a premium subscription for. All projects have a dropbox folder for large files, binaries, and assets such as presentations, e.g., things that should not be committed to a GitHub repository.
- Very large files and database-organized data I stored on Google Cloud, and in particular Google BigQuery.
# Analytical projects
- I maintain a separate [GitHub] repository for every project. I generally start by forking [this project template](https://github.com/murrayds/project-template), which has a space for organizing just about everything we need in a project's lifecycle.
- I also maintain a separate [[Obsidian]] folder for each project, which contains notes, ideas, and summaries.
- I do initial analysis in Jupyter and R notebooks. My choice of python or R depends on use case; I'm generally more comfortable with data processing and visualization in R, but use Python for libraries and making use of certain libraries. Usually, I have `scratch.ipynb` and `scratch.rmd` files that I leave untracked on GitHub.
- Once I am happy with the direction of an analysis, I convert the notebook into a dedicated script.
- I strive to have a [[Snakemake]] pipeline that automates the entire workflow from raw data to final publication-ready figures.
- I try to make ample use of [[Conda]] environments to make code reproducible. As a next level, I'm also trying to use [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) to containerize my code.
- I attempt to do as much as I can locally, but when not feasible, I run jobs on the [[Discovery SuperComputer]].
- In general, I strive to adhere to the [Documentation is Automation](https://cacm.acm.org/magazines/2018/6/228040-documentation-is-automation/abstract) philosophy, namely that even in places where code automation is not relevant, appropriate documentation of your methodology and approach is essential.
# Writing
- Initial writing I usually do in Google Docs due to how easy it is to access and use.
- When writing gets at all serious, I will switch to writing in LaTeX. While the standard [project template]([this project template](https://github.com/murrayds/project-template),) has a spot to put the files, I still prefer working with overleaf in most cases, due to its accessibility, collaboration features, and the fact that it hides much of the technical details of LaTeX compilation that so often cause headache.
# Collaboration
- Attempt to leave a paper trail for questions, which makes it easier to recall decisions later and to come to a consensus about priorities. Email is great for handling simple questions. Slack works for general chatter and back-and-forth.
- Specific analytical questions and bugs should be handled through the [[GitHub#Issues| GitHub issues]] system.