Internet Topics + CVu Journal Vol 31, #1 - March 2019

Browse in :

All > Topics > Internet
All > Journals > CVu > 311
Any of these categories - All of these categories

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: GitHubâ€™s Crazy Contribution-Graph Game

Author: Bob Schmidt

Date: 03 March 2019 15:51:42 +00:00 or Sun, 03 March 2019 15:51:42 +00:00

Summary: Silas S. Brown does a one-year streak.

Body:

For years I had been happy to post my public code on my personal home page, which is mostly hand-coded in HTML and has been in my public_html directory since before the turn of the century (although the external URL had to change a couple of times, which makes things harder for Internet Archiveâ€™s Wayback Machine). Google seems to have no problem finding my pages.

But nowadays some people donâ€™t even know what a personal home page is, because theyâ€™re so used to individualsâ€™ Internet presences being confined to large recognised social-media platforms, like Facebook (reputedly for people who like to mess around and air family feuds in public, and I wouldnâ€™t dare take out an account there and subject my creations to the whims of whatever crazy algorithm they come up with next, but thatâ€™s just my opinion), Twitter (reputedly not as bad as Facebook but sometimes I wonder), and LinkedIn, which is supposedly â€˜for workâ€™ â€“ itâ€™s not blocked in China and they have a Chinese name â€˜LingYingâ€™ which means â€˜leading heroesâ€™ â€“ but I for one havenâ€™t yet seen any real benefits from being on it, apart from occasionally being able to get back in touch with people whose email addresses have changed; I opened a private account on it to write a recommendation for someone who asked for one, and since then Iâ€™ve accepted connection requests from people I know (unless they got lost in the spam), but any actually-interesting news they post tends to be drowned in spammy â€˜work is greatâ€™ articles generated by the platform itself, which feels like itâ€™s trying to control my life instead of help.

Give me my public_html over that any day, apart from the small detail of some people not even knowing what Iâ€™m talking about when I say something is on my home page. I was once asked if â€˜home pageâ€™ means â€˜blogâ€™, and I said â€˜sort-ofâ€™, although the closest Iâ€™ve come to running it as a blog is to write an ugly shell script that automatically lists all page titles and makes an RSS feed; I only did that as a convenience for when I needed to access my notes quickly from a mobile device, and it doesnâ€™t track when pages are corrected. And the only way to comment is to email me. Very old-school. (An old classmate predicted Iâ€™ll eventually be waving a walking-stick shouting â€œyoung people donâ€™t know how to write HTML!â€)

I did, however, start checking things in to a SourceForge account when someone made me a subdirectory in his project and invited me to check in my stuff there, saying it would be convenient for people to see what Iâ€™ve changed before they update. I had previously thought â€œpeople can just check my home page to see if Iâ€™ve changed the version number since they last downloaded, and if they want a more detailed comparison they can run diff themselvesâ€, but I supposed a repository does give added convenience even when thereâ€™s only one contributor, plus having my code in version control makes me less frightened of deleting old parts instead of commenting them out. I did not, however, trust the SourceForge platform enough to be willing to â€˜liveâ€™ there: my home page stayed where it was, with the repository being very much a secondary â€˜mirrorâ€™ of my code without the HTML explanations that accompany it. I have an automatic script to fetch the latest code from my home page and update the repository, although unfortunately that means most of my commit summary messages just say â€˜updateâ€™, which is not very informative. (See cartoon below. It is from the xkcd comic, which can be found at https://xkcd.com.)

GitHub for the boardroom?

After SourceForge descended into an advertisement-ridden mess (fake Download buttons became particularly obnoxious), most projects switched to Git and GitHub, along with its later competitors GitLab and BitBucket (now owned by Atlassian). There are of course other Git hosting providers, like Assembla and Beanstalk, but they donâ€™t tend to have free tiers in their pricing models and are therefore less popular in the â€˜open sourceâ€™ world. I had already used Git on some closed-source projects (mostly if others I was working with wanted to use Git, or as a convenience to track old versions locally), but I hadnâ€™t bothered hosting anything on GitHub.

Then in late 2017 I was working part-time for a startup (it had been spun off from someoneâ€™s university research and I ended up getting involved), and someone there said he was trying to pitch the company to investors and said those investors wanted to see employeesâ€™ personal projects on their GitHub profiles. It seemed he thought the companyâ€™s work projects might not be sufficiently impressive so better show off the personal projects of employees as well, and it must be GitHub because thatâ€™s the only platform the investors had heard of. So I obligingly ported my Subversion directory into a bunch of GitHub projects and expanded my â€˜updateâ€™ scripts to mirror to both. (Some projects are only on Git as I didnâ€™t bother adding them in to the Subversion script, but anything I previously had on Subversion continues to be updated there as well.)

The company sold soon after, but I kept going with GitHub in case it was convenient to anyone, and later added GitLab and BitBucket â€˜just in caseâ€™ after GitHub was acquired by Microsoft and the general atmosphere of apprehension set in. Git lets you add multiple origins to a repository so that a single â€˜git pushâ€™ will update multiple remote servers (see Listing 1) so Iâ€™m able to keep the proverbial foot in all three camps for now.

git remote set-url origin --push --delete .
git remote set-url origin --push git@github.com:$USER/$REPO.git
git remote set-url origin --push --add git@gitlab.com:$USER/$REPO.git
git remote set-url origin --push --add git@bitbucket.org:$USER/$(echo $REPO|tr A-Z a-z).git

Listing 1

However, I couldnâ€™t help feeling a tiny little bit worried about what heâ€™d said â€“ the thought of those investors, who donâ€™t know what a home page is, trying to judge an individualâ€™s contributions on GitHub. If they donâ€™t know what a home page is, would they really be able to look at code and see if itâ€™s any good? Or are they just looking at numbers, metrics like Figure 1?

Figure 1 shows my GitHub contribution graph at the time of writing. It can be found at http://github.com/ssb22 when the site is set to Desktop mode (they donâ€™t show these graphs in Mobile mode). The top graph is with GitHub set to show public commits only; the bottom is with GitHub set to show private commits as well. If you look carefully, you will notice that some squares which are shaded darkly on the â€˜public commits onlyâ€™ graph are actually shaded less darkly on the graph with private commits as well: it seems the scale of the shading is adjusted to reflect the maximum number of commits in any one day over the period shown, so a high number of commits on one particular day can make the rest of the graph look flatter for a year. Itâ€™s a pity they didnâ€™t take more of an average when calibrating this scale; I would have coded â€˜the median of all non-zero daysâ€™ if it had been my job (although I would probably have opposed the implementation of the graph in the first place).

Figure 1

Thankfully, I have so far not been desperate enough to need to â€˜proveâ€™ myself to some ghastly metric-oriented boardroom who donâ€™t know the faintest thing about code quality and only look at graphs like this, but, just on the off-chance it ever comes to that, I have felt a certain amount of pressure to commit code on â€˜as many different days as possibleâ€™, which is the thing rewarded by GitHubâ€™s contribution-graph game. Never mind the size of each commit, and certainly donâ€™t mind its quality, just make sure your commits are spread over as many different calendar days as possible.

Playing the graph game

As you can see, it has now been a full year since I last missed a day. Some of that was due to my having multiple commits ready to go but choosing to delay them so as to have â€˜something for tomorrowâ€™ and â€˜something for the next dayâ€™ as well. Perhaps â€˜delayed commitsâ€™ are an unintended consequence of this game. At one point I went on a 5-day trip and had separate commits lined up for each of those 5 days, which I executed via an SSH client on a mobile phone. I havenâ€™t yet stooped to scheduling unattended commits in a cron job: what if I die in an accident and somebody notices I seem to be committing from beyond the grave? I care about people too much to risk playing with their emotions like that.

GitHubâ€™s graph can be tricked by fiddling with the dates in your commit log, and there are scripts out there to create fake repository histories whose commit dates cause pictures to appear in the GitHub graph (although the nature of GitHubâ€™s shading calibration means youâ€™d better not combine such a picture with too many other commits in the same year). The newer company GitLab makes fake histories harder to achieve, because the only timestamps that matter to GitLab are the ones made by GitLabâ€™s servers when accepting your push requests. So your script would have to run over many months of real time instead of back-dating hundreds of commits in one sitting. But I donâ€™t know if the decision-makers who are naive enough to judge a coder by these graphs will know the difference between GitHub and GitLab anyway. They might one day learn it, depending on how the reaction to Microsoftâ€™s acquisition of GitHub pans out in the coming months. (BitBucket does not have a graph, at least not yet.)

Figure 2 shows my GitLab contribution graph, which has a different (fixed?) shading scale; the large space on the left is because GitLab does not graph contributions until you push them to GitLab servers (which I didnâ€™t do before the Microsoft acquisition of GitHub), and the smaller gaps are due to â€˜UTC versus British Summer Timeâ€™ differences, on days when I pushed to both GitHub and GitLab shortly after midnight BST without realising that GitLab was working on UTC.

Figure 2

Many of my commits have been real coding or bug-fixing, but some were admittedly refactoring, minor corrections to comments or help text, or (especially) updates to the dictionary I use for Chinese text parsing, which is pushing the meaning of â€˜codingâ€™ somewhat. I thought I was going to have to break the streak in October when my wife and I went on a 9-day trip to see friends in Berlin, but our host had Wi-Fi and I still fiddled with that dictionary on my phone while I was waiting for people to get themselves ready and so on. (Being around native speakers does increase the probability that Iâ€™ll be exposed to something that prompts me to edit that dictionary.) As I write this, we are about to go on a 1-month trip to see in-laws in Hong Kong and Taiwan, so it is almost certain I will have to break the streak this time, but if by the time this article is printed you should see my graph does indeed show commits throughout February, then it probably means Iâ€™ve been fiddling on my phone again. (That doesnâ€™t sound good does it.)

It would, of course, be possible to keep a streak going for a month or more entirely automatically, if I were happy to resort to scheduling unattended commits, and if I were happy to artificially â€˜shredâ€™ some contribution into dozens of small commits to stretch it out over a month. But although I have previously delayed a commit or two for (at most) a handful of days, Iâ€™m not dishonest enough to take it to that extreme, plus I donâ€™t like to think of holding back finished work too long, and I certainly donâ€™t want to schedule automatic commits that do something completely meaningless like add a random number to a comment. I do, however, point out the possibility so as to draw further attention to the uselessness of this metric.

Private repositories and web hosting

GitHub private repositories became free for small teams in 2019, like the non-academic version of BitBucket (BitBucket users with university email addresses can have unlimited private collaborators, just like all GitLab users, but BitBucket non-academic and the new GitHub plan limit the team size of free private repositories). GitHub is so far the only platform to introduce the option of counting private commits on your public graph. I did start using GitHubâ€™s free private repositories as a convenience to access personal private projects (things I canâ€™t distribute because they include other peopleâ€™s copyrighted material like song lyrics); third-party hosting always carries some small risk that your files will be stolen in a break-in, but provided there is nothing TOO sensitive in there, having them on GitHub or similar can be useful: I can more easily access them from multiple systems and itâ€™s also an off-site backup thatâ€™s slightly less fiddly than putting a USB stick in my pocket, as long as the project I want to host fits in the limits (1G max and 100M max per file, unless you install Large File Storage extensions). There is now a script in my â€˜bits-and-bobsâ€™ repository called gitify which can help to â€˜gitâ€™-ify any files you have not currently under version control: it creates one commit for each file and dates each commit to the timestamp of the file, which is probably the only reasonable way to port timestamps to Git when they might be significant. But I carried on making at least one commit per day to a public repository as well, in case I ever have to delete the private repositories (which will revert the squares on the graph), or in case I ever have to use GitLabâ€™s graph instead of GitHibâ€™s.

All three platforms also offer to host static HTML files for website serving, although GitHubâ€™s free tier allows this only from public repositories. GitLab and BitBucket allow it from private repositories too, even on the free tier; serving pages from a private repository means someone needs to know a pageâ€™s URL before they can retrieve it, and also makes it a bit less easy to fork your site. All platforms default to enforcing HTTPS although GitLab lets you turn this off. The simplest setup is to create a repository called username.github.io, username.gitlab.io or username.bitbucket.io (substituting username as appropriate); GitLab also requires a file called .gitlab-ci.yml (see Listing 2 for a simple example).

If you do choose to create multiple mirrors of your website then you would be advised to insert markup like <link rel="canonical" href="http://..."> to indicate each pageâ€™s â€˜canonicalâ€™ address so that search engines are less â€˜confusedâ€™ by the duplication.

Actual results?

A few months ago I was emailed, out of the blue, a job offer from a startup who said they liked my GitHub activity. But they wanted to pay me equity instead of salary; I was not sufficiently convinced of that company's potential, so I told them I saw it as too much of a risk. I was polite enough to include a couple of suggestions about their idea in my reply, but they didnâ€™t write a second time.

But I really hope I wonâ€™t ever have to use a silly GitHub metric to impress a potential employer. Iâ€™d much rather leave them with a printed copy of CVu or Overload with one of my own articles in it. Iâ€™m happy to say this led to the last company becoming a corporate ACCU member themselves, although I was recently disappointed to discover that someone cancelled the ACCU subscription during the process of the companyâ€™s being acquired by Oracle (and I donâ€™t understand the Oracle corporate structure enough to know what to do about this) but at least the subscription did some good while it lasted.

Silas S. Brown Silas is a partially-sighted Computer Science post-doc in Cambridge who currently works in part-time assistant tuition and part-time for Oracle. He has been an ACCU member since 1994.

Notes:

More fields may be available via dynamicdata ..