Cleaning Up GitHub (for Data Science) | by S. T. Lanier | Jan, 2021


Github, from its conception, provides relatively minimalistic structure. It lacks a true directory, or file, structure. For example, the vast majority of those 339 repositories of mine are from Flatiron School’s Data Science Bootcamp, but I can’t just throw them all in a “Flatiron” folder because GitHub inherently lacks that structure. There are a number of options for mimicking that structure, though, with varying degrees of bastardization required of the tool’s original purpose.

For the purpose of cleaning up your repositories page, I think subtree is the winner between the two, but I’m sure someone out there would have better use of git submodule. For a fairly detailed discussion on the difference between the two, read here, here, and here, but the main difference is this:

  • submodule leaves a pointer inside the outer repository pointing to a specific commit in the inner repository (it doesn’t move the inner repo inside the outer repo the way we think about a file moving inside a folder) and has an explicit command in git, git submodule, making it easy to setup, but hard to maintain thereafter;
  • subtree, on the other hand, actually moves the code of the inner repo into the outer repo, like moving a file into a folder, but is does not have a default command in git, making it a little harder to setup but easier to maintain afterwards.

These two methods can quickly become time consuming if attempting to utilize their structure across hundreds of repos, but you would get to keep your contributions graph.

Just from the name, this one sounds promising––my instinct when hearing this name is something like “I get to group multiple repositories together under a single project”––but has a functionality much closer to a todo list for keeping track of issues, pull requests, and notes. Still, I have heard of people using this feature as a way to organize repositories (up to 5 repos per project).

This one was the winner for me. You can create organizations to group repos under and it 1) removes the repos from your repository page and 2) lists the organization name instead. So in that important regard, this is arguably the closest you can come to mimicking a directory structure on GitHub. It looks like this:

New organization circled in red, bottom left corner. Image by author.
Hover effect for organizations. Image by author.

Hover your mouse over it, and it provides some information about the organization. In the image, it still says I have some 300 repositories, but that’s because I haven’t moved most of them over to the new organization yet. Transferring a repository into an organization, even one owned by you, removes the repository from the list of repositories on your profile page and removes any contributions made to those repositories from your contributions graph. For me, this was a small price to pay for a nice place to keep these 300 repositories tucked away together and out of sight, but I’m sure for some this is the worst of both worlds: the repos are still around and I lost the contribution graph.

For an even better example, check out this article by Andrei Cioara.

Read More …


Write a comment