Data as a Public Good

Municipal governments control many datasets that are very useful to social services, community planning groups, health advocates, and many other civic and social organizations. This document assesses data needs for these organizations and proposes that municipal governments should commit to publishing these datasets rather than requiring data users to request them.

Also available as a PDF. 

Properly managing any modern organization requires data, no less so for the social and civic organizations that form the backbone of a healthy community. Unfortunately, these organizations face roadblocks acquiring and using data they need. The problem is not the existence of data—we have never had more data than we do today—but rather the difficulty associated with getting the correct data in usable forms. Correcting this situation must start by studying who in a community uses data and what needs they have.

This document presents the results from a series of interviews with various types of data consumers. This analysis does not aim to be comprehensive; rather, it is representative of a sample of types of data users. The goal is to identify some of the high-value San Diego municipal government datasets that would benefit civic development if they were released to the public.

Users from a range of civic and social organizations were interviewed about their data use, their data needs, and the problems they have with using data. About 30 users from 21 groups and organizations were interviewed, including:

  • 8 community planning groups
  • 3 affordable housing advocates and developers
  • 3 city council members
  • 7 other nonprofits

There are many more users and user types to consider in a complete analysis, but because a small number of datasets were mentioned by most of the interviewees, we believe that this set is sufficiently representative for an initial assessment.

The interviewees reported using or wanting a range of data that fell into these categories:

  • Housing. Types and locations of housing, housing costs, market rents, and ownership.
  • Crime & Public Safety. Incident-level data about arrests. Calls for service.
  • Infrastructure. 311 calls, construction permits, capital improvement projects, and defects.
  • Transportation. Street segment traffic counts and collision locations.
  • Business & Economy. Business locations and aggregated tax payments.
  • Homelessness. Locations of homeless people, encampments, and related arrests.
  • Health.  Disease prevalence.

The datasets in these categories primarily originate in government organizations, although some are available from private organizations, both commercial and nonprofit, as summarized in the table below.

Municipal Govt State Govt National Govt Private Commercial Private Nonprofit
Transportation x x x
Infrastructure x
Housing x x x
Homelessness x
Health x x x x
Crime and
Public Safety
x
Business and
Economy
x x x x x

Table 1. Data categories and primary sources.

Among the datasets and sources identified, those that were most inaccessible and had the highest marginal value were those held by municipal governments. The following datasets held by municipal governments were mentioned most frequently as being valuable to civic and social organizations:

  • Summaries of arrests
  • Summaries of service calls for fire and police
  • Street segment traffic counts
  • 311 reports
  • Construction permits
  • Capital improvement projects
  • Business tax payments, aggregated to census block by NAICS code

These are datasets that municipal open government efforts should strive to make available.

Limits to Data Access

Interviewees reported that many valuable datasets were difficult or impossible to acquire, usually for one of the following reasons:

  1. The dataset is owned by a private corporation, and the price is too high.
  2. The data is available, but because of privacy issues it is aggregated to a high-level geography. For instance, the data reports values per county, which cannot be used for analyzing neighborhoods.
  3. The data is available, but privacy rules like HIPAA mandate controls that the organization can’t implement.
  4. The data is held by an agency incapable of releasing the data.
  5. The data is held by an agency that refuses to release the data.

The first three problems are matters of cost or negotiation and fall within the realm of normal business issues. While these access issues are obstacles, they can be overcome with time, money, or creativity. The final two are more problematic and will be considered in more detail.

Item (4), limited access because of agency capability, often appears to be the result of an inflexible, uncooperative bureaucracy, but is really a lack of technology and process. Few government agencies, particularly at the municipal��������������������������������������level, have the software and processes to release data. Data has not been vetted to ensure that releasing it will not pose risks to the organization, and the agency staff don’t have software support for releasing it easily to the web, so data releases can be ad hoc and time-consuming, a task for which there is no additional budget. These issues can be solved by tackling the issue directly, most effectively with a high-level mandate for increased openness and transparency, followed by software and services to limit the cost of the mandate.

Problem (5), limited access because of agency intransigence, is much more difficult. The agency that was mentioned most frequently (and with one exception, solely) as being unwilling to release data is SANDAG. Interviewees noted difficulty in getting crime data, traffic counts, and traffic model inputs and outputs from SANDAG. SANDAG restricts access to these datasets by charging access fees, limiting the number of records provided, publishing the data only in difficult-to-use formats, or by an outright refusal to release the data. Several interviewees said that the only way to get SANDAG to cooperate is to have a lawyer send the request, sometimes with the threat of a lawsuit.��Because the datasets that SANDAG controls—crime and traffic data—have a high social value, and because the time, cost, and effort to force SANDAG to comply with Public Records Requests is so onerous, their disinclination to release data imposes a significant social cost.

In a detailed analysis of 18 Public Records Requests to SANDAG for crime and traffic data over the last three years, requests made by journalists and county employees were handled quickly and without cost. Requests made by citizens who were not journalists or students were not fulfilled—largely denied indirectly by offering to fill these requests at costs ranging from $500 to $3000. Although this limited analysis cannot make a definitive assessment of SANDAG policy, it appears SANDAG management is restricting access to data by setting unreasonably high fees.

[box type=”info”] In March and April of 2013, we’ve received a lot of traffic data and information from SANDAG, with direct emails from Senior policy analysts and a very fast and complete response to a PRA request for traffic data. Crime data continues to be difficult and expensive to acquire, but the agency has been easy to work with for traffic data. [/box]

[box type=”info”] In August of 2013, SANDAG began publication of crime data to their website. While there are still some issues with the usability of the data, this publication is a major step in openness, resolving most of the concerns raised in this report.   [/box]

Recommendations for Better Civic Use of Data

To ensure that cities realize the full value of the data they produce and manage, we recommend that municipal governments:

  • Require release. Identify datasets of high public value, and mandate their release.
  • Make release easy. Provide departments with simple tools for publishing data.
  • Make use easy. Work with external organizations to serve the needs of the community.

These recommendations will increase the pace of civic development in the San Diego region, and, through public-private partnership, can be implemented with minimal cost.

Require Release: Identify Datasets and Mandate Release

Municipalities should identify the datasets that have the highest social value and formally mandate that departments release that data.

When data has a social value, not releasing the data has a social cost, a cost which impedes community development. We recommend that municipal governments identify the data that has the highest value to citizens, nonprofits, planning groups, and businesses and formally mandate its release, requiring agencies to publish data to the Internet, rather than requiring citizens to use the PRA process.

Developing a list of the datasets to release should involve an extension of this analysis, followed by a community process to ensure that the datasets are valuable to users. The community process should involve government departments from all of the regional governments, because the most common users of data shared by a governmental department are staff at other government departments.

This initial analysis suggests that the most important datasets for municipal governments to release are in the categories of crime and public safety, infrastructure, and transportation—datasets that should be considered in more detail.

Crime and Public Safety

As the largest single category in a city’s budget, it should be expected that data about public safety is important to communities. Crime and public safety data are valuable for a wide range of uses but are also the hardest to acquire. This data should be considered commensurate with the demographic data collected by the US Census and formally released to the public. Instead, the data is restricted and limited, with a few notable exceptions, such as the San Diego Fire Dispatch online application.

Municipalities should mandate the release of:

  • Summary arrest records, including time, date, type, location, and description of crime.
  • Calls for service, for both police and fire.
  • CAPPed property lists.

In California, summary arrest records are considered public data and include the time, date, category, location, and brief description of arrests. However, despite being explicitly referenced as public data in the California Public Records Act, San Diego Police departments and SANDAG are very reluctant to release crime incident data, and will only release small numbers of records for a fee unless the Public Records Act request is issued by a lawyer.  For most nonprofits, crime data adequate for long-term trend analysis is effectively impossible to acquire.

Because of the value to the community and the difficulty in obtaining these datasets, crime and public safety data should be a top priority for transparency efforts to secure and ensure continued public access.

City Infrastructure

Second only to public safety, the maintenance of streets, sidewalks, utilities, and other infrastructure is a major category attracting citizens��� complaints, concern, and comments. By releasing infrastructure information, citizens are better able to understand the city’s budget constraints and can become participants in the process of setting priorities, rather than being adversaries.

Some of the infrastructure datasets identified in this analysis as valuable are:

  • Construction permits
  • Capital improvement projects
  • 311 calls
  • Asset maps, locations of street lights, and traffic lights
  • Other contracted work

Our initial analysis indicates that, at least in the City Of San Diego, the departments that control these datasets are very interested in making them available to the public, with the primary impediment being the effort required to release them. Although the political will exists to release more data, a mandate to release these datasets may still be valuable, as it would allow the departments to increase the priority of data efforts when creating their budgets.

Transportation

While citizens are concerned about traffic congestion and road safety, it is mainly nonprofits who use traffic data. Transportation data is instrumental to the operation of transportation advocates like Move San Diego and Walk San Diego, policy centers like the Equinox Center, affordable housing advocates who study jobs/housing fit, health advocates, social service organizations, and many others. Unfortunately, good, usable traffic counts and traffic model inputs and outputs are very difficult to get from SANDAG.

We recommend that cities either require SANDAG to publish the following datasets as spreadsheets rather than PDFs, or request the data from SANDAG and publish it themselves. Some of the most valuable traffic datasets are:

  • Traffic counts
  • Traffic surveys
  • Transportation model outputs

Because these datasets have been restricted and difficult to obtain, mandates to publish them would be particularly valuable.

Make Release Easy: Provide Simple Software and Dedicated Help

To ensure that municipal staff participate in data release efforts, it is important to make releasing data easy and to provide dedicated support, preferably without impact to departments’ budgets. 

Few municipal departments already have the processes, staff, and software for properly releasing data, and the experience of many data management efforts is that without high-level mandates, support, and training, programs that add extra work for staff are likely to fail.

A solution to this problem is to allow staff to submit data in whatever form is easiest for them and centralize the collection, cleaning, and publication of data outside the departments that provide the data. Then, as the data release programs mature, departments can add specialized software and processes to make data release more formal and automatic.

We recommend that municipalities make data release easy for staff by providing:

  • Simple software for staff to submit data, such as one of the Open Source data repositories, CKAN or DKAN.
  • A dedicated person who can work across departments to help staff find and upload data.
  • Data cleaning and publication services that are outside of the departments that produce the data.

Data is most valuable when it can be combined with other data, which requires that datasets adhere to standards that establish common fields where links can be made. Data use is most efficient when data users are consulted about their needs so the data can be structured to make analysis easy. The easiest way to accommodate both of these requirements is to provide a single organization that works with both data users and data producers to broker information exchange.

Because the goal of the data release effort is to make data public, and most of the data users are outside of municipal government, the role of the data broker can be played by a non-governmental organization; this is the intent behind the San Diego Regional Data Library. Using an external organization has many benefits, including a broader connection to users, more flexible cost structure, and immunity from changes in administration. However, cities that want more control to distribute data that cannot be made public can set up internal data departments using the same software and processes as those employed by an external organization.

Make Use Easy: Use Partnerships to Serve Data Users

Data is only valuable if it is usable, and making data usable requires studying users, a task better suited to private organizations who work closely with users. 

While it is natural for a municipal government pursing a transparency program to publish its own data, many cross-community efforts have concluded that data distribution has the highest value when data is stored and distributed outside of government. This was the conclusion of an effort that began in San Diego in 2011, led by Planning Director Bill Andersen and the National Neighborhood Indicators Partnership. Nationwide, many of the most successful data intermediaries, such as the Connecticut Data Collaborative, are public-private partnerships.

Government���������s internal efforts can begin with good intentions, but because they are subject to the changing priorities of each transition in administration, a stable, long-term effort should be a partnership with government, not a program of government.

There are a variety of models that can be employed, ranging from having a nonprofit run centralized repositories for both government and private data users, to distributed models with multiple data repositories. Regardless of the model chosen, it is important to recognize the importance of working closely with end users to understand their needs and ensure that the data products produced are useful. Working with organizations that have close connections to data users is an excellent way to satisfy this requirement.

Open Data Accelerates Community Building

Municipalities exist to organize, protect, and develop communities, and they can better serve this mission, at a lower cost, by allowing the community to more fully participate in data collection, analysis, and use. Making the best use of data will require both government and private organizations to work together, and we suggest that the best arrangement for the San Diego region will involve:

  • Municipal governments identifying the high-value datasets they produce and mandating their release.
  • Community representatives, such as Regional Taskforce for the Homeless, the San Diego Housing Alliance, the Malin Burnham Center for Civic Engagement, and many other social and civic organizations, woking together to identify data needs.
  • Data producers, such as universities, hospitals, social service organizations, and governments collaborating to share and distribute data.
  • Data intermediaries, such as the San Diego Regional Data Library, providing technology and management support to the collaborations.

The cities of the San Diego region have an unprecedented opportunity to accelerate community development efforts region-wide by creating the type of data collaboratives that other cities have benefitted from for decades. Building the technology and relationships will take time and dedication from many players, so we should start now. We at the San Diego Regional Data Library look forward to contributing to this effort. If you are or your organization is building a piece of this new civic infrastructure, we’d like to meet you; please contact the Director of the Library, Eric Busboom, at eric@sandiegodata.org or 858-386-4134.