Background

JPSurv software was developed to analyze trends in survival with respect to the year of diagnosis [1]. Survival data includes two time scales that must be taken into consideration: the calendar year of diagnosis and the time since diagnosis. The JPSurv model is an extension of the Cox proportional hazards model and of Hakulinen and Tenkanen [2] for relative survival. JPSurv fits a proportional hazard joinpoint model to survival data on the log hazard scale. Jointpoint models consist of linear segments connected through “joinpoints,” which represent the times at which trends changed. The hazard of cancer death is specified as the product of a baseline hazard, on the time since diagnosis scale, and a multiplicative factor describing the effect of year of diagnosis and other covariates. The year of diagnosis effect is modeled as joined linear segments on the log scale. The number and location of joinpoints are estimated from data. The model implies that the probability (hazard) of cancer death as a function of time since diagnosis is proportional for subjects diagnosed in different calendar years. The JPSurv software uses discrete-time survival data, i.e., data grouped by years since diagnosis in the life table format. The software can accommodate both relative survival and cause-specific survival.

The purpose of this tutorial is to provide a step-by-step illustration on how to utilize JPSurv to analyze changes in survival trends.


Objective

The goal is to estimate relative survival trends from 1975 for patients (any sex, any stage) diagnosed with Non-Hodgkin Lymphoma (NHL) or Chronic Myeloid Leukemias (CML). Note that we usually don’t select the last diagnosis year (2018 in our case) because the last year may miss some of the deaths and survival may be overestimated.


Step 1: Extract the data from SEER*Stat

JPSurv can read in grouped relative or cause-specific survival data generated from SEER*Stat. The minimum variables required are: calendar year of diagnosis, survival time interval (i.e., years from diagnosis), number at risk at beginning of interval, number of deaths, number of cases lost to follow-up, and, in the case of relative survival, interval expected survival. Data may also include other covariates of interest, such as cancer site, sex, stage, etc.


Opening the survival session, selecting the database, and selecting the cases
  1. Open the survival session
  2. Select database: Select the SEER*Stat database (SEER Research Data, 9 registries (1975-2018) Nov 2020 submission)
  3. Select Relative survival: Go to the STATISTICS tab and select “Relative survival”
  4. Selection of years of diagnosis 1975-2017 and cancer sites. Go to the SELECTION tab. In the CASE SELECTION box click EDIT. In the VARIABLE box go to:
    1. “Race, Sex, Year Dx” folder. Click Year of diagnosis and select years 1975 through 2017
    2. “Site and Morphology” folder. Click Site Recode ICD-O-3/WHO 2008. Select Non-Hodgkin Lymphoma and Chronic Myeloid Leukemias

Defining the intervals for calculations
  1. Define 10 annual (or 12 month) intervals for calculation of relative survival and the output format
    1. In the INTERVAL box in PARAMETERS tab, for “Month Per” select or type 12. For “Number” select 10.
    2. In the DISPLAY Box check Standard Life.

Defining the intervals for calculations using the CANSURV/JPSurv Output format

Creating the year of diagnosis and cancer site variables for stratification
  1. Include the variables that will be used to stratify survival calculations. Go to the TABLE tab.
    1. Create a new variable that will include single year of diagnosis from 1975 through 2017. Click on the “Race, Sex, Year Dx” folder. Click Year of diagnosis and then the CREATE button. Delete 1975-2018 and 2018 values and save it with a new name. Add this as a page, row, or column (it does not matter which)
    2. Create a new variable that will include only 2 sites NHL and CML. Using the same process, click on the “Site and Morphology” folder. Click Site Recode ICD-O-3/WHO 2008 and then the CREATE button. Delete all values, select NHL and CML. Save it with a new name. Add this as a page, row, or column (it does not matter which)
  2. Run SEER*Stat

Saving and Exporting
  1. Save the results and export the matrix. Go to the Matrix in the main menu. Matrix -> Export -> Results as Text File

Step 2: Reading the data into JPSurv, specifying parameters, and running JPSurv

Importing data using Dic/Data files

Data sets generated in SEER*Stat must include year at diagnosis as a covariate. To reflect trends in calendar years, 12 months per interval needs to be specified. In addition, the “Display/Standard Life” option in the Parameters tab must be checked as JPSurv does not read the “Summary Table” format. Data can be exported directly from SEER*Stat and saved as a .txt file with a corresponding dictionary .dic file.

  1. Open JPSurv by searching for “JPSurv” and “NCI” keywords or entering the URL https://analysistools.cancer.gov/jpsurv/
  2. Confirm that Dic/Data Files is selected in the File Format section.
  3. Click the Choose Files button and select the .txt and .dic files exported in Step 1. Note that you need to select both at the same time.
  4. Press Upload Input Files button

Importing data using CSV files

JPSurv can also read data from delimited text files with common delimiters (comma, semicolon, or tab). Users will need to have some prior knowledge of the data stored in each column to correctly use JPSurv. The data requires at minimum: calendar year of diagnosis, survival time interval, number at risk at beginning of interval, number of deaths, number of cases lost to follow-up, and, in the case of relative survival, interval expected survival. Data may also include other covariates of interest, such as cancer site, sex, stage, etc. Note that the survival time interval should be at equal intervals and should not have any gaps. Note that when using a delimited text file as input to JPSurv, the user must convert the interval expected survival column to Proportions from Percentages as the acceptable range of inputs is [0,1].

  1. Open JPSurv by searching for “JPSurv” and “NCI” keywords or entering the URL https://analysistools.cancer.gov/jpsurv/
  2. Confirm that CSV Files is selected in the File Format section.
  3. Click the Browse button and select the .csv or .txt file exported in Step 1.
  4. Once the file is selected a new window will pop up for CSV Configuration
  5. Choose the appropriate delimiter
  6. If the file contains headers, make sure the box titled “Does the file contain headers?” is checked. Otherwise, uncheck the box.
  7. Choose the appropriate data type, either Relative Survival or Cause-Specific Survival.
  8. Choose the appropriate display option for rates, either Percents or Proportions.
  9. You can change the desired number of rows to display
  10. Choose the drop-down menu for each column and select the parameter corresponding to each column. Note that all required parameters must be mapped to the appropriate column.
  11. Press Save.
  12. Press Upload.

Importing data using JPSurv workspace
  1. Open JPSurv by searching for “JPSurv” and “NCI” keywords or entering the URL https://analysistools.cancer.gov/jpsurv/
  2. Confirm that Workspace is selected in the File Format section.
  3. Click the Browse button and select the .jpsurv workspace file.
  4. Press Import and the JPSurv session will load automatically.  

Fit a model with 1 Joinpoint to NHL data

Fitting a simple joinpoint survival model to NHL data
  1. Import NHL data using the appropriate option (Dic/Dat, CSV, or Workspace)
  2. In the Year of Diagnosis Range box, specify the number of years that will be used to fit the JPSurv model. In this example, all years of diagnosis will be used.
  3. In the Max No. of Year from Diagnosis (follow-up) to include box, the user can specify the maximum intervals from diagnosis to select a subset of the input data to be used in analysis. In this example, the maximum length of follow-up allowed will be used.
  4. Select “Non-Hodgkin Lymphoma” as the cohort of interest
  5. For “Maximum joinpoints,” select 1 from the dropdown menu
  6. Click Calculate
  7. Examine the results. You will see that the model probably needs more joinpoints

Fit models with max no. of Joinpoint = 4 to NHL and CML

Fitting complex joinpoint survival models to NHL and CML data
  1. In the Cohort and Model Specifications box select both “Non-Hodgkin Lymphoma” and “Chronic Myeloid Leukemia.” In general, multiple cohorts can be selected at once, but this increases computation time, and an email address is required for a notification to be sent when results are available.  
  2. In “Maximum Joinpoints” select 4 from the dropdown menu.
  3. Click the Calculate button.
  4. Type your email
  5. When you receive the e-mail, click “View Results”
Advanced Options

Explore the results:

Users can export the cohort, model specification, and results, either to an Excel spreadsheet or a workspace file. Results can be exported to Excel via the “Download Full Dataset” option or to a workspace file via the “Export Workspace” option.

JPSurv uses the minimum Bayesian Information Criterion (BIC) to select the best fitted model. The Akaike Information Criterion (AIC) is also provided. AIC tends to select models with a higher number of joinpoints. Graphs and other output features are available for other fitted models beyond the final selected model.

Graphs display predicted (modeled) and observed survival or interval probabilities of death for each joinpoint model and cohorts. The default model displayed is the final selected model. However, the user can select other fitted models. The user can check “Show Trend Measures” and hit “Recalculate” to display the trend summary measures. All JPSurv plots are created using the ggplot2 R package [3].

For certain graphs, users can display trend measures by checking the box “Show Trend Measures.” The annotation feature is only available when there are 3 or fewer intervals and models have 3 or fewer joinpoints. Model estimates are displayed in terms of the number and location of joinpoints, parameter estimates, and standard errors. Data displayed in graphs may be downloaded using the “Download Graph Dataset” option.


Explore the joinpoint survival model results

  1. How many joinpoints does the final selected model for NHL have, and in which years are the joinpoints located?
    1. Answer: The final selected model has 4 joinpoints located at 1983, 1994, 2002, and 2012
    2. Note: Because the final model has 4 joinpoints, it would be advisable to test a model with 5 joinpoints
  2. In which periods is survival increasing vs. decreasing? Click the “Include Trend Measures -> Between Joinpoints” options and then click “Recalculate.”
    1. Answer: Survival is increasing for all periods except after 2012, the last identified joinpoint
  3. Include 1-year in the graph with 5-year and 10-year cumulative survival. In “Select years since diagnosis (follow-up) for survival plot and/or trend measures” select x1. Click “Recalculate.”
  4. Look at the Death vs. Year at Diagnosis graphs. Include the model and data for the 1-year probability of death
  5. Produce a graph that shows cumulative survival by time since diagnosis for patients diagnosed in 1975 and another graph for patients diagnosed in 2009.

Save the data and results

You can save all the results and the data by selecting either “Export Workspace” or “Download Full Dataset.” You can retrieve the results by selecting and opening the workspace via the Workspace option under “File Format.”


References

  1. Yu BB, Huang L, Tiwari RC, Feuer EJ, Johnson KA. Modelling population-based cancer survival trends by using join point models for grouped survival data. Journal of the Royal Statistical Society Series a-Statistics in Society. 2009;172:405-25
  2. Hakulinen T, Tenkanen L. Regression Analysis of Relative Survival Rates. Applied Statistics. 1987;36(3):309-17.
  3. H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.