Browse in : |
All
> Topics
> Internet
All > Journals > CVu > 311 Any of these categories - All of these categories |
Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.
Title: GitHub’s Crazy Contribution-Graph Game
Author: Bob Schmidt
Date: 03 March 2019 15:51:42 +00:00 or Sun, 03 March 2019 15:51:42 +00:00
Summary: Silas S. Brown does a one-year streak.
Body:
For years I had been happy to post my public code on my personal home page, which is mostly hand-coded in HTML and has been in my public_html directory since before the turn of the century (although the external URL had to change a couple of times, which makes things harder for Internet Archive’s Wayback Machine). Google seems to have no problem finding my pages.
But nowadays some people don’t even know what a personal home page is, because they’re so used to individuals’ Internet presences being confined to large recognised social-media platforms, like Facebook (reputedly for people who like to mess around and air family feuds in public, and I wouldn’t dare take out an account there and subject my creations to the whims of whatever crazy algorithm they come up with next, but that’s just my opinion), Twitter (reputedly not as bad as Facebook but sometimes I wonder), and LinkedIn, which is supposedly ‘for work’ – it’s not blocked in China and they have a Chinese name ‘LingYing’ which means ‘leading heroes’ – but I for one haven’t yet seen any real benefits from being on it, apart from occasionally being able to get back in touch with people whose email addresses have changed; I opened a private account on it to write a recommendation for someone who asked for one, and since then I’ve accepted connection requests from people I know (unless they got lost in the spam), but any actually-interesting news they post tends to be drowned in spammy ‘work is great’ articles generated by the platform itself, which feels like it’s trying to control my life instead of help.
Give me my public_html over that any day, apart from the small detail of some people not even knowing what I’m talking about when I say something is on my home page. I was once asked if ‘home page’ means ‘blog’, and I said ‘sort-of’, although the closest I’ve come to running it as a blog is to write an ugly shell script that automatically lists all page titles and makes an RSS feed; I only did that as a convenience for when I needed to access my notes quickly from a mobile device, and it doesn’t track when pages are corrected. And the only way to comment is to email me. Very old-school. (An old classmate predicted I’ll eventually be waving a walking-stick shouting “young people don’t know how to write HTML!â€)
I did, however, start checking things in to a SourceForge account when someone made me a subdirectory in his project and invited me to check in my stuff there, saying it would be convenient for people to see what I’ve changed before they update. I had previously thought “people can just check my home page to see if I’ve changed the version number since they last downloaded, and if they want a more detailed comparison they can run diff themselvesâ€, but I supposed a repository does give added convenience even when there’s only one contributor, plus having my code in version control makes me less frightened of deleting old parts instead of commenting them out. I did not, however, trust the SourceForge platform enough to be willing to ‘live’ there: my home page stayed where it was, with the repository being very much a secondary ‘mirror’ of my code without the HTML explanations that accompany it. I have an automatic script to fetch the latest code from my home page and update the repository, although unfortunately that means most of my commit summary messages just say ‘update’, which is not very informative. (See cartoon below. It is from the xkcd comic, which can be found at https://xkcd.com.)
GitHub for the boardroom?
After SourceForge descended into an advertisement-ridden mess (fake Download buttons became particularly obnoxious), most projects switched to Git and GitHub, along with its later competitors GitLab and BitBucket (now owned by Atlassian). There are of course other Git hosting providers, like Assembla and Beanstalk, but they don’t tend to have free tiers in their pricing models and are therefore less popular in the ‘open source’ world. I had already used Git on some closed-source projects (mostly if others I was working with wanted to use Git, or as a convenience to track old versions locally), but I hadn’t bothered hosting anything on GitHub.
Then in late 2017 I was working part-time for a startup (it had been spun off from someone’s university research and I ended up getting involved), and someone there said he was trying to pitch the company to investors and said those investors wanted to see employees’ personal projects on their GitHub profiles. It seemed he thought the company’s work projects might not be sufficiently impressive so better show off the personal projects of employees as well, and it must be GitHub because that’s the only platform the investors had heard of. So I obligingly ported my Subversion directory into a bunch of GitHub projects and expanded my ‘update’ scripts to mirror to both. (Some projects are only on Git as I didn’t bother adding them in to the Subversion script, but anything I previously had on Subversion continues to be updated there as well.)
The company sold soon after, but I kept going with GitHub in case it was convenient to anyone, and later added GitLab and BitBucket ‘just in case’ after GitHub was acquired by Microsoft and the general atmosphere of apprehension set in. Git lets you add multiple origins to a repository so that a single ‘git push’ will update multiple remote servers (see Listing 1) so I’m able to keep the proverbial foot in all three camps for now.
git remote set-url origin --push --delete . git remote set-url origin --push git@github.com:$USER/$REPO.git git remote set-url origin --push --add git@gitlab.com:$USER/$REPO.git git remote set-url origin --push --add git@bitbucket.org:$USER/$(echo $REPO|tr A-Z a-z).git |
Listing 1 |
However, I couldn’t help feeling a tiny little bit worried about what he’d said – the thought of those investors, who don’t know what a home page is, trying to judge an individual’s contributions on GitHub. If they don’t know what a home page is, would they really be able to look at code and see if it’s any good? Or are they just looking at numbers, metrics like Figure 1?
Figure 1 shows my GitHub contribution graph at the time of writing. It can be found at http://github.com/ssb22 when the site is set to Desktop mode (they don’t show these graphs in Mobile mode). The top graph is with GitHub set to show public commits only; the bottom is with GitHub set to show private commits as well. If you look carefully, you will notice that some squares which are shaded darkly on the ‘public commits only’ graph are actually shaded less darkly on the graph with private commits as well: it seems the scale of the shading is adjusted to reflect the maximum number of commits in any one day over the period shown, so a high number of commits on one particular day can make the rest of the graph look flatter for a year. It’s a pity they didn’t take more of an average when calibrating this scale; I would have coded ‘the median of all non-zero days’ if it had been my job (although I would probably have opposed the implementation of the graph in the first place).
Figure 1 |
Thankfully, I have so far not been desperate enough to need to ‘prove’ myself to some ghastly metric-oriented boardroom who don’t know the faintest thing about code quality and only look at graphs like this, but, just on the off-chance it ever comes to that, I have felt a certain amount of pressure to commit code on ‘as many different days as possible’, which is the thing rewarded by GitHub’s contribution-graph game. Never mind the size of each commit, and certainly don’t mind its quality, just make sure your commits are spread over as many different calendar days as possible.
Playing the graph game
As you can see, it has now been a full year since I last missed a day. Some of that was due to my having multiple commits ready to go but choosing to delay them so as to have ‘something for tomorrow’ and ‘something for the next day’ as well. Perhaps ‘delayed commits’ are an unintended consequence of this game. At one point I went on a 5-day trip and had separate commits lined up for each of those 5 days, which I executed via an SSH client on a mobile phone. I haven’t yet stooped to scheduling unattended commits in a cron job: what if I die in an accident and somebody notices I seem to be committing from beyond the grave? I care about people too much to risk playing with their emotions like that.
GitHub’s graph can be tricked by fiddling with the dates in your commit log, and there are scripts out there to create fake repository histories whose commit dates cause pictures to appear in the GitHub graph (although the nature of GitHub’s shading calibration means you’d better not combine such a picture with too many other commits in the same year). The newer company GitLab makes fake histories harder to achieve, because the only timestamps that matter to GitLab are the ones made by GitLab’s servers when accepting your push requests. So your script would have to run over many months of real time instead of back-dating hundreds of commits in one sitting. But I don’t know if the decision-makers who are naive enough to judge a coder by these graphs will know the difference between GitHub and GitLab anyway. They might one day learn it, depending on how the reaction to Microsoft’s acquisition of GitHub pans out in the coming months. (BitBucket does not have a graph, at least not yet.)
Figure 2 shows my GitLab contribution graph, which has a different (fixed?) shading scale; the large space on the left is because GitLab does not graph contributions until you push them to GitLab servers (which I didn’t do before the Microsoft acquisition of GitHub), and the smaller gaps are due to ‘UTC versus British Summer Time’ differences, on days when I pushed to both GitHub and GitLab shortly after midnight BST without realising that GitLab was working on UTC.
Figure 2 |
Many of my commits have been real coding or bug-fixing, but some were admittedly refactoring, minor corrections to comments or help text, or (especially) updates to the dictionary I use for Chinese text parsing, which is pushing the meaning of ‘coding’ somewhat. I thought I was going to have to break the streak in October when my wife and I went on a 9-day trip to see friends in Berlin, but our host had Wi-Fi and I still fiddled with that dictionary on my phone while I was waiting for people to get themselves ready and so on. (Being around native speakers does increase the probability that I’ll be exposed to something that prompts me to edit that dictionary.) As I write this, we are about to go on a 1-month trip to see in-laws in Hong Kong and Taiwan, so it is almost certain I will have to break the streak this time, but if by the time this article is printed you should see my graph does indeed show commits throughout February, then it probably means I’ve been fiddling on my phone again. (That doesn’t sound good does it.)
It would, of course, be possible to keep a streak going for a month or more entirely automatically, if I were happy to resort to scheduling unattended commits, and if I were happy to artificially ‘shred’ some contribution into dozens of small commits to stretch it out over a month. But although I have previously delayed a commit or two for (at most) a handful of days, I’m not dishonest enough to take it to that extreme, plus I don’t like to think of holding back finished work too long, and I certainly don’t want to schedule automatic commits that do something completely meaningless like add a random number to a comment. I do, however, point out the possibility so as to draw further attention to the uselessness of this metric.
Private repositories and web hosting
GitHub private repositories became free for small teams in 2019, like the non-academic version of BitBucket (BitBucket users with university email addresses can have unlimited private collaborators, just like all GitLab users, but BitBucket non-academic and the new GitHub plan limit the team size of free private repositories). GitHub is so far the only platform to introduce the option of counting private commits on your public graph. I did start using GitHub’s free private repositories as a convenience to access personal private projects (things I can’t distribute because they include other people’s copyrighted material like song lyrics); third-party hosting always carries some small risk that your files will be stolen in a break-in, but provided there is nothing TOO sensitive in there, having them on GitHub or similar can be useful: I can more easily access them from multiple systems and it’s also an off-site backup that’s slightly less fiddly than putting a USB stick in my pocket, as long as the project I want to host fits in the limits (1G max and 100M max per file, unless you install Large File Storage extensions). There is now a script in my ‘bits-and-bobs’ repository called gitify which can help to ‘git’-ify any files you have not currently under version control: it creates one commit for each file and dates each commit to the timestamp of the file, which is probably the only reasonable way to port timestamps to Git when they might be significant. But I carried on making at least one commit per day to a public repository as well, in case I ever have to delete the private repositories (which will revert the squares on the graph), or in case I ever have to use GitLab’s graph instead of GitHib’s.
All three platforms also offer to host static HTML files for website serving, although GitHub’s free tier allows this only from public repositories. GitLab and BitBucket allow it from private repositories too, even on the free tier; serving pages from a private repository means someone needs to know a page’s URL before they can retrieve it, and also makes it a bit less easy to fork your site. All platforms default to enforcing HTTPS although GitLab lets you turn this off. The simplest setup is to create a repository called username.github.io, username.gitlab.io or username.bitbucket.io (substituting username as appropriate); GitLab also requires a file called .gitlab-ci.yml (see Listing 2 for a simple example).
If you do choose to create multiple mirrors of your website then you would be advised to insert markup like <link rel="canonical" href="http://...">
to indicate each page’s ‘canonical’ address so that search engines are less ‘confused’ by the duplication.
Actual results?
A few months ago I was emailed, out of the blue, a job offer from a startup who said they liked my GitHub activity. But they wanted to pay me equity instead of salary; I was not sufficiently convinced of that company's potential, so I told them I saw it as too much of a risk. I was polite enough to include a couple of suggestions about their idea in my reply, but they didn’t write a second time.
But I really hope I won’t ever have to use a silly GitHub metric to impress a potential employer. I’d much rather leave them with a printed copy of CVu or Overload with one of my own articles in it. I’m happy to say this led to the last company becoming a corporate ACCU member themselves, although I was recently disappointed to discover that someone cancelled the ACCU subscription during the process of the company’s being acquired by Oracle (and I don’t understand the Oracle corporate structure enough to know what to do about this) but at least the subscription did some good while it lasted.
Silas is a partially-sighted Computer Science post-doc in Cambridge who currently works in part-time assistant tuition and part-time for Oracle. He has been an ACCU member since 1994.
Notes:
More fields may be available via dynamicdata ..