Journal Articles

CVu Journal Vol 27, #4 - September2015
Browse in : All > Journals > CVu > 274 (13)

Note: when you create a new publication type, the articles module will automatically use the templates user-display-[publicationtype].xt and user-summary-[publicationtype].xt. If those templates do not exist when you try to preview or display a new article, you'll get this warning :-) Please place your own templates in themes/yourtheme/modules/articles . The templates will get the extension .xt there.

Title: What do people do all day?

Author: Martin Moene

Date: 02 September 2015 07:27:20 +01:00 or Wed, 02 September 2015 07:27:20 +01:00

Summary: Christopher Gilbert shares his routine in a software house.

Body: 

I work as a Senior Software Engineer for DataSift, the world’s leading social data provider, inventor of the Twitter ‘retweet’ button, and fastest growing SaaS start-up in Europe. In my previous life I worked as a software engineer in the video games industry, working for award winning independent studios across the UK. These days I spend most of my time working on real problems in imaginary fields of buzzwords.

I recently finished working on one of the most exciting projects of my career, a project known as Facebook Topic Data. We (DataSift) are the first company to partner with Facebook to deliver real-time insights into brands, topics and audiences for Facebook customers. I am currently working on a top secret project, the details of which will be revealed in the not-too-distant future.

The team

DataSift currently employs over 140 people, distributed across San Francisco, New York, Reading and London. When I joined, the core engineering team consisted of just 10 developers, but has since swelled to over 50. Many of the developers are begrizzled veterans, with hard experience working at the coalface in related fields; ex-contractors and others of that ilk who are attracted to the particular brand of autonomy a startup company breeds.

The main office in Reading is single floor, mostly open plan, with a few meetings rooms which we have been spilling into as the team grows. There is a kitchen where lunch is served daily, and is otherwise fully stocked with free drinks and snacks, a chillout area with arcade machines, a table-tennis table and a pool table, and a quiet room for interruption free working, which makes for a pleasant retreat from the numerous – and increasingly elaborate – nerf wars. Lego and office toys are littered everywhere, sometimes bought from a store, sometimes homespun out of odd parts and a Raspberry Pi or Arduino by one of our hardware hackers.

Amongst other perks already mentioned, developers get to choose their own kit, attend conferences of their choosing – which I have been exploiting to attend ACCU, the best conference of the year – and work on their own projects in company sponsored hackathons and Innovation time (aka 20% time). I recently used my 20% time to research and develop a slew of algorithms that would accelerate our bespoke filtering and classification system, though other proposals have ranged from innovative new product features all the way through to obscure programming language concepts.

The software

The main product consists of around 400 or so components written in one of several languages including C++11, Python, Ruby, Node, PHP, Java, Scala and Go. Because of the range of languages in use, polyglot developers are highly sought after by our hiring team, though seemingly increasingly rare jewels to find. The freedom of language choice can be quite liberating, allowing us to draw upon the relative strengths of each language to solve any given use-case.

Our infrastructure consists of around 300 or so commodity servers running CentOS Linux and managed using Chef and colocated at our datacentre. Internally we have a self-serve provisioning system based on OpenStack and a cluster of dedicated servers, plus dedicated staging hardware. Metrics are provided via Graphite and Riemann, log aggregation via Logstash and Kibana. A custom written intranet portal provides a full range of dashboards for convenience.

Our main product can be thought of as a distributed information retrieval system built using a microservice style architecture, before microservice was a buzzword. In many ways the DataSift platform is like several products in one: generic data ingestion and normalisation, filtering and classification, generic data storage and delivery, historic (‘big data’) data access, aggregation and reporting, and more coming soon!

I primarily work on the real-time pipeline, which is mostly all C++11 code written within the last 5 years, and makes extended use of the Boost library. An in-house compiler and virtual machine runs filters that customers write in a DSL called CSDL, which is a bespoke filtering and classification language intended for our customers, and is also used internally by our data science team. Besides Boost we make use of third party open source software including Riemann, Kafka, Redis, MySQL, Zookeeper, ZeroMQ, memcached and many more. We package and maintain over 40 open-source projects besides the default packages provided by the OS, several of which were written in-house by our developers. Performance is always a primary concern, as C++ is chosen specifically as the implementation language for high-performance components.

The kit

Most Sifters are self-confessed hackers, so in DataSift, ‘Windows’ is a dirty word. Almost everyone works on custom desktops with an additional laptop running Ubuntu Linux or some other flavour of Linux and a minimalist tiling window manager like i3 or xmonad with emacs or Vim. Some developers prefer to develop on Mac, so MacBook Pro’s and 4k monitors adorn the desks of the hipsters (myself included).

Development tools are not mandated, so there’s complete freedom to choose what you want to work with. Some prefer Vim, others Sublime Text, some even use JetBrains IDE’s. We license the best-in-class tools, and for everything else we draw upon FOSS to fill our needs. Although there are quite a few tools that just don’t work well (if at all) under Linux, it still amazes me how large the open source community is, and the quality of the software available.

Every room has a monitor kitted with a Chromecast and Apple TV, and each meeting room is equipped with video conferencing gear. Many developers opted for standing desks that are height adjustable, and there are bean bags and comfy looking sofas around the place for anyone who wants to use them.

The process

We work in an agile way, but in the true spirit of agile we have customised our tools to fit the team, rather than fitting the team to the tools. It can take an enormous amount of effort to write custom tools, but in the case of ticket tracking we chose to instead spend a (relatively enormous) amount of money on Jira, which can be customised in every way imaginable. Tools are usually adopted by team consensus rather than mandate by management.

We implemented a continuous delivery pipeline using the recentlyopen-sourced Thoughtworks go.cd, and an elaborate system of levers and pulleys to automate testing and packaging of every component. The cluster, which then use Docker to run integration tests using an open-source testing framework developed in-house.

We use git for source control. We used git flow for a time, but as we transitioned to ever more sophisticated automation it became increasingly difficult to handle multiple release branches, so we have been moving towards a simplified workflow. For practical purposes we still make use of feature branches, but we are tending towards a branch by ticket model, which allows us to track work all the way from jira through the entire delivery pipeline.

The future

Like any team we have challenges to overcome. Perhaps counter-intuitively, it’s much harder to scale a team quickly than it is to scale software, and we’re still learning the most effective ways of doing that. Other trends that seem to be emerging right now are increasing sophistication in how we trap and respond to errors, improving our agile process with new techniques, growing our team with other like-minded s/crazies/developers/, adding even more languages to our bulging repertoire (my money is on Rust being next), and building increasingly ambitious features to augment our core product offering.

However, if working for a start-up has taught me one thing, it’s that what happens next is impossible to predict, but that’s what excites me about this work. Working for a start-up is often chaotic, and sometimes a bit crazy, but always challenging and ultimately deeply rewarding. You are forced to innovate or die trying, and if you lose momentum you will be left in the ruins of wasted effort. For some this environment would be a terrible nightmare, but for those who thrive in the chaos, and maybe have just that tiny bit of crazy in them, it’s the best job in the world.

If you’d like to find out more about DataSift please visit our website at http://www.datasift.com/. Check out my open source contributions on Github at http://www.github.com/bigdatadev or follow me on Twitter @bigdatadev

Notes: 

More fields may be available via dynamicdata ..