This document is designed to offer current and potential PUF investigators some basic guidelines and recommendations for how to approach the data provided in the NCDB PUFs. This document specifically addresses the PUF investigators as the primary reader.
The Participant Use File (PUF) is pulled from the National Cancer Data Base (NCDB), a joint program of the American College of Surgeons Commission on Cancer (CoC) and the American Cancer Society (ACS), and is offered as an added value to clinical investigators at CoC-Accredited cancer programs who desire to conduct their own studies. The aim of the CoC and NCDB is to position investigators to successfully use the PUFs. There are a number of resources available to investigators.
Distributed PUFs are organ specific, based upon specific ICD-O primary site and histology combinations, and should be sufficient to address the study proposals described in reviewed applications. The site-histology combinations used to select the cases provided in the PUF data sets are documented: http://seer.cancer.gov/siterecode/icdo3_dwhoheme/index.html.
If you have had a previous PUF file for the same site as your new file, you may note that the number of cases in the new file has decreased, especially for older diagnoses. The data you received this year are limited to cases reported by currently-accredited CoC hospitals. Cases reported by hospitals that are no longer accredited are excluded. Case reports for hospitals that are not currently accredited are not updated in the NCDB data, and their quality cannot be assured. The principal effect will be on cases more than 5-10 years old.
In compliance with the Health Insurance Portability and Accountability Act (HIPAA) regulations, PUFs have been stripped of all direct patient identifiers, de-identified according to the "Safe Harbor" rules.  The case identification number contained within is randomly assigned, and will change with each PUF version release. The PUF Case IDs are not the same across cancer sites, and cases cannot be linked across cancer sites.
The 2013 PUF Release includes data for patients diagnosed between 2004 through 2013. The year of diagnosis should be used to select patients appropriate to the timeframe of the planned analysis. The availability of some data items is determined by diagnosis year, and not all data items in the PUF are available throughout the entire ten-year span of the PUF. To verify availability of data items by diagnosis year, be sure to review the description of each item in the on-line Data Dictionary.
All CoC accredited programs that initially diagnose a patient or that provide all or part of first course treatment report the case to the NCDB. If more than one facility submitted a report, the “best” is provided in the PUF file (PUF_MULT_SOURCE variable, coded 1), based on the most recent patient contact with the program, completeness of coded detail and/or edit quality, where differences exist. The record used in the case of ties is arbitrary. If this item is coded 0, only one facility provided a report for this cancer.
Every facility has a reference date, from which they are accountable for the completeness of the data for cases diagnosed in that year through the present. Since a facility may request to move their reference date forward, there are some instances where a case’s diagnosis year falls before the facility’s reference date. This item, REFERENCE_DATE_FLAG, is coded 0 in cases where this occurs. A 1 signifies cases where the diagnosis year is on or after the reference date year. Reports for cases whose diagnosis date is prior to the reference date cannot be changed or updated by the facility.
The data item Sequence Number refers to the sequence of malignant and non-malignant tumors diagnosed in a patient and is used to distinguish cases with multiple cancer diagnoses. By default, your PUF includes all sequence codes available for each reported patient. Patients with only one lifetime cancer diagnosis will have a sequence number code value of 00. Sequence number 01 indicates that the reported tumor is the first of multiple diagnoses. The NCDB has no mechanism by which to link separate case reports of the same patient.
The PUF only includes "analytic cases" whose initial diagnosis and/or treatment were/was performed at the reporting facility. Class of Case 00 denotes cases diagnosed at the reporting facility that did not receive any treatment at that facility. Class of Case 10-14 are cases that were initially diagnosed and provided all or part of their treatment at that facility. Class of Case codes 20-22 are those patients that were diagnosed at another facility and received all or part of their treatment at the reporting facility.
The PUF is limited to cancer programs currently accredited by the CoC. Hospital Type provides a general classification of the reporting facility's structural characteristics, and defines a portion of the criteria required for CoC Accreditation. PUFs identify reporting facilities as one of four types: Community, Comprehensive, Teaching/Research hospitals, or Integrated Network Cancer Programs. These categories follow the classification scheme used by the CoC accreditation program, and are determined by a variety of factors.
Area-based or environmental measures of patient income and education are provided in the PUF. These measures are derived by linking the reported ZIP code of the patient's residence at the time of diagnosis to year 2000 Census data. The data describing median household income and level of educational attainment represent the ZIP code of patient residence, not that of individual patients. Since the Census only uses the short form as of 2010, the majority of the information traditionally collected by the decennial census is now collected in the American Community Survey (ACS).
Comorbid disease burden is represented in the PUF as a score. This score is based on the Deyo adaptation (1992) of Charlson's comorbidity index and can be used as a mechanism to control for pre-existing medical conditions that may affect treatment decisions. The score is mapped from as many as ten reported ICD-9-CM secondary diagnosis codes.
The AJCC clinical and pathologic stage groups included in the PUF are a TNM-based system coded or reported according to the edition corresponding to the patient’s diagnosis year. The fifth edition of the AJCC Staging Manual is used to represent patients’ cases diagnosed from 1998 through 2002. The sixth edition describes the anatomic extent of disease for patients diagnosed from 2003 through 2009. Patients diagnosed in 2010 are staged according to the seventh edition of the AJCC Staging Manual Data. Exercise caution when using staging information.
Several PUF projects examine one or more laboratory prognostic indicators. These are available as Site Specific Factors (SSF) collected as part of the Collaborative Stage Data Collection System (CS). The term “collaborative” means that the data collection tool was devised to meet the various needs of cancer registry data standard setters such as the Commission on Cancer (CoC), Surveillance Epidemiology and End Results (SEER), and the National Program of Cancer Registries (NPCR).
The treatment information provided in the PUF is limited to "first course of treatment", which is defined as all methods of treatment recorded in the treatment plan and administered to the patient before disease progression or recurrence. "No therapy" is a treatment option that occurs if the patient refuses treatment, the family or guardian refuses treatment, the patient dies before treatment starts, or the physician recommends no treatment be given.
The PUF includes a "crow-fly" or great circle distance measure between the latitude and longitude of the centroid of patient's ZIP code of residence and the latitude and longitude of the facility mailing address. The precision of this item as an indicator of the true distance between two points is dependent upon the spatial area of the ZIP code and the proximity of the facility’s administrative mailing address to the actual treatment center.
The CoC accreditation standards require an annual 90% follow-up rate for all living, eligible, analytic patients diagnosed within the last 5 years and an 80% follow-up for all eligible analytic cases from the cancer registry's reference date. Participating registries report patient follow-up to the NCDB annually. The PUF data do not include cause of death information, so cause-specific survival cannot be calculated.
Beginning in 1998, all CoC-accredited cancer programs are required to submit case reports to the NCDB in response to the annual Call for Data. In the NCDB PUF, facilities are assigned a random ID, PUF_FACILITY_ID. This ID is assigned regardless of cancer site, so researchers may identify the same facilities across cancer sites. The number of CoC-accredited cancer programs changes from one diagnosis year to the next.