Public Services > Central Government

Data Science Campus sets off to make its mark on ONS

David Bicknell Published 26 April 2017

Campus managing director Tom Smith explains how the new organisation can help the UK catch up with other advanced economies in the exploitation of administrative data

 

The Office for National Statistics (ONS) recently launched its new Data Science Campus at its Newport HQ as part of a £17m investment in statistics to harness new data sources and technologies.

Its creation followed directly from the Bean Review, led by Charlie Bean, a former deputy governor of the Bank of England, which called for a shake-up of data collection and concluded that the UK “significantly lags” behind many other advanced economies in its exploitation of administrative data.

Among its recommendations, Bean called for the ONS to establish a new centre for the development and application of data-science techniques to the production of economic statistics.

The new Data Science Campus will work on projects within five themes, under the collective title of “People, Planet and Prosperity”: evolving economy; urban and rural; society; sustainability; and UK in a global context. The Campus intends to work with national and international partners from academia, government and business to deliver joint research programmes and to build the UK’s data science capability, including providing funding opportunities for PhD candidates.

As part of its new approach, the ONS recently set up the first UK Data Analytics apprenticeship and also recently launched a new internship in data journalism, both part of an ongoing investment in skills to meet the challenge of providing richer and faster statistics to inform the UK’s decision-makers and public.

The managing director of the new Data Science Campus, Tom Smith, recently sat down with Government Computing ahead of ‘purdah’ to discuss the Campus’s role at the forefront of data science in the UK, becoming a hub and incubator for data science, as well as collaborating with organisations across the UK and internationally.

For the last ten years before taking on his role, Smith has been running a research spinout from Oxford University, a data and technology company built from scratch.

Smith said, “We’ve been funded by the Treasury as a result of the Bean Review to step up on the way that ONS develops, generates, understanding for the economy and society across the UK. That matches ONS’ remit to produce statistics for the public good.”

The work that ONS is doing focuses on developing aggregate statistics to help inform decision makers. As part of that, ONS specialists say, the organisation is looking for ways to make that data as good as it can be so that any key decisions that are made are as well-founded as they can be.

The Bean Review had picked up on a perception that the ONS had failed to keep pace with some of the technical developments and the ability to carry out analysis on large data sources at speed using techniques such as machine learning, deep learning type approaches, text analysis and natural language processing.  In general, there was a feeling from outside the ONS that it had fallen behind the curve. 

Data sources

“Both of those were fair points to raise in the review, although we’ve moved a long way in the last year” says Smith. “The funding came out of the back of the Bean Review and supported ONS to do two things. One of these is the Economic Statistics Centre of Excellence (ESCOE) which is pushing forward on our understanding of underlying economic fundamentals. And ONS commissioned an external group led by the National Institute of Economic and Social Research (NIESR), to deliver this. So there’s a really interesting piece there. Alongside that, the Data Science campus was funded to strengthen the capacity and capability for ONS to do leading-edge data science work. And hence here we are.”

According to Smith, the Data Science Campus’ remit is about strengthening ONS’s ability to carry out more complex data science analysis, and testing new data sources. ”Basically the Campus is here to apply leading-edge data science for public good. I want to take the sort of skills that are typically used in the commercial world to sell advertising, and use them to improve what we know about the world so that government and others can make better decisions”.

Smith is reluctant to use the word “skunkworks” to describe the work going on at the Data Science Campus. Instead, he prefers another description.

“Our aim is to do research in innovative things at speed and with a clear view of understanding where they bring value to traditional ONS outputs. Essentially by assessing this data source, this technique or both, can you strengthen, improve or add to the sort of things we are already publishing? Our success can be judged in a number of ways, and one of them will be how many concrete improvements did we come up with over time?”

Smith is keen for the Data Science Campus to get beyond the idea that it is just playing with data. “We want to say, this is a science concept, so you have to set up experiments. We need to define what success is, and that might be part of the experiment, to define that measure. We might need to iterate and see if we can improve on that starting point. But at some point we’ll make a call and say, ‘You know what. This source, it’s really interesting. I can see it’s useful for these sorts of things. But for ONS it doesn’t necessarily add value. ‘

“And that would be a successful project, where we actually find out something that doesn’t work. But of course a really successful project would be finding something that’s useful, that can add to what we already publish and in due course would aim to be loaded in and in some way, added to official outputs.”

Learning from the Digital Service Standard

Smith argues that having worked outside government for many years, one of the things he has seen from that standpoint is that the public sector does an enormous range of things, many of which don’t see the light of day for all sorts of good reasons.

“But I think what that can lead to is it becomes difficult to stop projects,” he says. “So what we’re really interested in is running a data science project that takes six months, with a very clear view of what the end point is, and publishing what we found.

“There is a point that we can learn from the way that digital projects and digital service have been viewed for some time, which is really typified by the GDS Digital Service Standard. One of the standards we’ve been discussing is around iterate and improve. And the key bit there is ‘improve’, which means you have to have a measure of what good and bad is. And what better is and what improve actually means. So that gives you a measure of what your end point is. ”

Operating at pace

Smith outlined that beyond what the Data Science Campus is there to do, is the question of how it will work effectively. There are three or four key elements, he suggests.

“One of these is operating at pace and having a very clear understanding for each project of what improvement means, when are we going to stop, what does ‘done’ mean? And that’s very much from an agile kind of framework.

“The next one is there are lots of ‘reads- across’ from existing digital frameworks  like the service standards which say, ‘Have you really understood user needs?’ ‘Have you put in place a way of testing this with users?’ All of this in the research world, we’re probably not quite as good at nailing. We tend to be more like, ‘This is an interesting problem. Let’s have (a look) at it.’ Whereas actually putting a little bit of framework around it helps turn that from something interesting, in principle, to  where you do that exploring as part of something and you have a very clear view of a product or a thing that you’re producing at the end.”

Another area is how the Data Science Campus and the ONS will work together. “The pipeline from the Data Science Campus into ONS is something we’re working on,” says Smith. “How we collaborate, across ONS teams, is something many of whom are have been thinking about for a while. How do you add bandwidth and capacity around how you deal with big datasets, and what kind of techniques might be appropriate for using here?  It is essentially about working much more closely with the expert teams across ONS. So there is a really strong collaborative element.

“How that boils through into ONS official outputs, I can see in three ways. The first is a process one. And comes back to how you develop data tools and the link across to digital products. Things like automating testing and checking our practices are robust. That comes from digital but is really getting into analysis, reproducible, reviewable, audited, all that kind of stuff.

Census

He goes on, “More interestingly in terms of data outputs, there is something around how do we validate some of the work they’re doing? So this is a second area that we and other groups in ONS – including the Big Data team - are doing already, including some good internal things on core ONS outputs.

“For example, if you’re looking at the Census, you essentially have a comprehensive survey at a point in time. And there is a large amount of other data that you can use to cross check against that data. So, for example, say you’re looking at commuter patterns. People travel to work; you’ve got data from mobile phones, potentially, which you can use to estimate residence to workplace movement.  Or if you’re looking at who’s moved in the previous year, or student populations, again, you’ve got data.  Or you might have a theory that Twitter can help you with your location and get you some way of cross checking, triangulation of what you see in the Census, against other data sources in the real world.”

Smith highlights that a high profile element of the work ONS does is the Census, which will take place in 2021. “The Census has been described both in and out of the ONS as the jewel in the crown for ONS folks. It’s a huge part of the energy and activity of the organisation and it’s a hugely important data source for government, businesses, non-profits and so on. So any work that we or other groups in ONS can to do support Census outputs as reliable, robust, accurate and up to date and so on, will have big impacts in terms of official outputs.  So that’s the second example.

“The third one is that there may well be data outputs that we can produce as part of our research, which may not at this stage be official statistics but we may put them out as research outputs. For me that’s one of the areas of interest. That’s the ‘edge’ if you like, because that’s where you start pushing on to totally new ways of understanding the world.”

Google StreetView

Smith cites a recent example about a project, which is in its early days around specific techniques.  “One of the things we’re interested in is using image analysis and processing and classification. It’s a difficult thing. As a statistician it’s not something that many statisticians do in their day job but there can be huge amounts of value to it. One of the recent examples is Flowminder doing Afghanistan population Census based on satellite imagery. And there are commercial organisations like Orbital Insight that are doing things like estimate of economic growth of cities in China.

“The one I wanted to pick out, which matches back to ONS interests, is using street level imagery such as Google Street View. Researchers in the US have essentially taken a huge image download and they are using that to estimate demographic indicators, including things like income and wealth at local area level, and comparing that with the American Community Survey which is one of the equivalents of our Census. So what you’re seeing is that you can get from a completely different data source quite an accurate estimate of indicators you are interested in, which is then updatable and you can cut it in many different ways.” 

Smith says the Data Science Campus is itself using local street image data as a project. We are starting on the image analysis, whether we have a way of downloading it and all sorts of things like whether you can use that for official ONS outputs. At this kind of exploration stage, it’s appropriate to look at the value of different sources.

Data hunting v Data Mining

This “early day project” for the Campus is linked to natural capital.

Smith asks, “Can you use Google Street View or any local authority imagery of their area to predict and estimate the number of trees or other natural capital assets?  “That is a problem that is important to groups in ONS and Defra and has feedback to all sorts of things like air quality and the local environment. So there are some interesting links with policy and real issues. But for us it’s almost a test problem. In the US, researchers are using this data to look at cars in the local area, and out of that predict what kind of area it is. For example you can look at the type of car, age of car and so on.”

Smith describes this as like ‘data hunting’.

“If you have a thought, a theory, a model of something interesting in the world, you then look for data analysis that contests that. You could look at the number of skips or the number of broken windows and you can test that out.

“Data science when it’s done well matches statistical processes as well, and you definitely don’t start from a data mining perspective if you’re doing robust work. You start from a theory or model about what the world is like and then say, ‘That’s plausible.’ So we have this golden thread from, say, skips in the streets through to a measure of local building work with houses getting renovated through to an indicator of the local economy.

“We start from those sorts of things and then go and look at what data is available. In contrast, ‘data mining’ starts from a ‘data first’ stance - ‘I’ve got this huge amount of info and I’m going to throw it all into this statistical sausage machine and then see what pops out’. It shouldn’t be done that way and it’s something for us to be careful of.”

In general, argues Smith, the Office for National Statistics (ONS) needs to produce good data that helps people make decisions around the issues of the day.

“If we are doing our work well as the ONS, we should be providing data and analysis that helps government and businesses make those decisions.  So it’s absolutely right that ONS – led by people like UK National Statistician John Pullinger and Director General for Data Capability, Heather Savory – is putting out timely information that people use in decisions. Government and businesses are going to make those decisions, whatever, because they have to be made. And they’ll be made with or without your info. So you have a role to put the information into those decisions. The ONS mission is better statistics for better decisions, which sums it up. ”

Working with external organisations

Smith expects the Data Science Campus expects to work with a number of external organisations, such as universities.

“From our side we are looking to work with a number of different groups. There is lots of expertise and experience in universities, both in terms of the data science techniques – the theories and principles underlying work – but also a particular example of researchers looking at whether you can get value out of data. We are working with the Alan Turing Institute, based at Kings Cross in London, is the national institute for data science and setup by a consortium of five universities. Another group we are working with is the University of Warwick, where researchers are doing some very interesting work around how you use things like social media data to estimate size of crowds and tourism levels.

“One of the things we’ve been doing is setting up memorandums of understanding with universities to essentially strengthen those relationships. We’re looking to work jointly, so there are good opportunities to actually collaborate on projects. In future, we are also exploring Ph.D. students coming in, seconding in, both ways perhaps.”

There is also good involvement with other parts of government, including the Government Digital Service (GDS)

“Within government, the Government Digital Service (GDS) itself and the Cabinet Office do a lot of work on data science. GDS looks at professional framework and development and the idea of a data scientist as a profession, and that’s a very interesting area of conversation. And then we look at the capability development. So we work very closely with them and we partner on projects. We are together running the government’s Data Science Accelerator where ONS, GDS and Government Office for Science mentor people from across government on three month projects from their business area. So that’s good skills capability development across government. “

Increasing the skills pipeline

Smith also has a second objective around increasing the pipeline of skills.

“Data scientists are expensive. He or she can go and work pretty much anywhere in any major organisation now and command large salaries. Their skills are quite high end at the moment. So there aren’t enough – demand outstrips supply. So there is a real pipeline issue for government. On the other hand, government, I believe, has the most interesting and important problems to solve.

“So in terms of how we build some of the skills, and the pipeline, then that is something that a lot of organisations are looking at very hard,” Smith continues. “What we’re doing at the Campus is a number of things. Firstly, increasing the number of people coming into the system, which is why we’re running MSc courses and apprenticeships - there is a really interesting thing to explore around vocational training.

“As well as school leavers, we are seeing people who have gone into university, but who may have dropped out after a year because it wasn’t real world enough, or people who are retraining as data scientists and analysts. We’re providing a two-year vocational course where they come with a mixture of training and learning and then real work on projects, helping deliver value to ONS and other organisations.

The Campus had 140 applicants for the first round of the UK Data Analytics apprenticeship, and took on eight in December to work on campus as well as through the Learning Academy, an ONS group that supports training across ONS and government.

Metrics

Smith says there will be a number of measures in which he and the Data Science Campus will be judged by government in terms of its success in changing ONS’ outlook and improving its performance.

“The two key areas will be around delivery and capability and. Around project delivery, our impact will be measured in the way that we’re helping ONS innovate with data science, successful collaborations we’re working with; the areas across the ONS organisation that we’ve run experiments with; new data sources and or new techniques that we’ve tested and so on. Around capability, our impact will be in how we’re working with and supporting data science capacity.  That would be things like increased understanding of the potential for data science in improving policy, services, statistics across government, and the increased skills and capability of ONS and government to deliver value through data science.

“There will also be something around international support, because all of these issues are relevant to national statistics agencies around the world.”

Smith is keen to learn from other national statistics agencies, but says there is no gold standard to follow.  “There is no one organisation that I’d point to. There are things that we can and should be learning from quite a number of different agencies.  One example would be Statistics Netherlands who have been working in this space for a little bit longer, and we are working with. They have been doing some really interesting analysis such as embedding sensors in roads around the country. 60,000 sensors, every minute, ping back the volume of cars going past. And you can use that data to predict GDP, about 2-3 months in advance of official statistics.  That is looking at a new data source and a way of analysing that in a way that gives you a slightly different view onto the same thing. New light on old problems.”








We have updated our privacy policy. In the latest update it explains what cookies are and how we use them on our site. To learn more about cookies and their benefits, please view our privacy policy. Please be aware that parts of this site will not function correctly if you disable cookies. By continuing to use this site, you consent to our use of cookies in accordance with our privacy policy unless you have disabled them.