As a close observer of California’s ongoing effort to build a “Cradle-to-Career” data system, I sometimes hear concerns about the quality of the data that would feed into such a system. Over the nearly 20 years I have conducted research on higher education issues, I have used student-level data from the California Community Colleges (CCC) and the California State University (CSU), as well as individual records of workers’ employment and earnings collected by the Employment Development Department (EDD), giving me some firsthand experience with the accuracy and completeness of these data. To learn even more about the quality of California’s education and workforce data and possible implications for the work ahead to create a data system,  I recently had in-depth conversations with 14 researchers and other experts who have significant experience using the data, and summarized my findings in a policy brief.

These experts agreed that data quality issues do not pose a barrier to developing a preschool-to-workforce (P20W) data system in California. They acknowledged that, as is the case with any large administrative data set, there are some issues with missing or inaccurate information. But such problems generally affect a relatively small share of cases, sometimes occurring when new data are collected or changes are made to how information is categorized. Different interpretations of data reporting instructions, as well as limited resources and staff capacity for collecting the data, can also introduce inconsistencies. Not surprisingly, data elements that are routinely used for a particular purpose—such as allocating funding, administering a program, or reporting on accountability metrics—are of higher quality than elements that serve little purpose for the agencies. While some data that California would like to include in a P20W system are not quite ready to be included in a statewide data system (e.g., some early learning data), most public K-12, postsecondary, and workforce data are of good quality, with a level of problems that can be mitigated through effective quality control and data management practices.

So, why are there concerns about data quality? Clearly, it’s important to ensure that, when making decisions about issues related to public education, the information is of high enough quality that no harm will be done. One issue is that the timing of data reporting and updates to the information can mean that data in statewide reports do not match an institution’s own records, which can undermine trust in the information. The proliferation of online dashboards and other data tools is another issue, as they often include similar metrics with slightly different definitions. This can be confusing and make it difficult for data users to interpret the information or understand how and why it varies. While the problem is really one of definition and interpretation rather than the accuracy of the underlying data used to create the metrics, it contributes to the perception of poor data quality.

Another issue that the experts with whom I spoke emphasized is that the purpose for the data matters in assessing the significance of any data quality issues. If the data are to be used for calculating metrics, creating data dashboards, and conducting research to improve policy and practice—the purposes generally served by P20W data systems in the many other states that have them—having a small share of records with quality issues does not pose a significant problem. More caution is needed in considering the use of information in a statewide data system to provide services to individual students, as this purpose requires exceptional accuracy. It also poses greater challenges related to data privacy and security, and requires the information to be “real-time” or at least very up to date, which is technically difficult and costly to achieve.

Creating a P20W data system could actually help improve data quality. Linking data sets together provides more opportunities to cross-check information and identify inconsistencies. There are numerous opportunities at the institutional, system, and state levels to improve quality in California’s proposed data system, as outlined in the policy brief. Putting the data to use in ways that serve the reporting agencies themselves, and that address broader concerns related to education, labor market and safety net planning, will help to prioritize quality. As one interviewee for my report said, “when it comes to data, use drives quality.”