Background

JPSurv software was developed to analyze trends in survival with respect to the year of diagnosis [1]. Survival data includes two time scales that must be taken into consideration: the calendar year of diagnosis and the time since diagnosis. The JPSurv model is an extension of the Cox proportional hazards model and of Hakulinen and Tenkanen [2] for relative survival. JPSurv fits a proportional hazard joinpoint model to survival data on the log hazard scale. Jointpoint models consist of linear segments connected through “joinpoints,” which represent the times at which trends changed. The hazard of cancer death is specified as the product of a baseline hazard, on the time since diagnosis scale, and a multiplicative factor describing the effect of year of diagnosis and other covariates. The year of diagnosis effect is modeled as joined linear segments on the log scale. The number and location of joinpoints are estimated from data. The model implies that the probability (hazard) of cancer death as a function of time since diagnosis is proportional for subjects diagnosed in different calendar years. The JPSurv software uses discrete-time survival data, i.e., data grouped by years since diagnosis in the life table format. The software can accommodate both relative survival and cause-specific survival.

The purpose of this tutorial is to provide a step-by-step illustration on how to utilize JPSurv to analyze changes in survival trends.

Objective

The goal is to estimate relative survival trends from 1975 for patients (any sex, any stage) diagnosed with Non-Hodgkin Lymphoma (NHL) or Chronic Myeloid Leukemias (CML). Note that we usually don’t select the last diagnosis year (2018 in our case) because the last year may miss some of the deaths and survival may be overestimated.

Step 1: Extract the data from SEER*Stat

JPSurv can read in grouped relative or cause-specific survival data generated from SEER*Stat. The minimum variables required are: calendar year of diagnosis, survival time interval (i.e., years from diagnosis), number at risk at beginning of interval, number of deaths, number of cases lost to follow-up, and, in the case of relative survival, interval expected survival. Data may also include other covariates of interest, such as cancer site, sex, stage, etc.

Opening the survival session, selecting the database, and selecting the cases

Open the survival session
Select database: Select the SEER*Stat database (SEER Research Data, 9 registries (1975-2018) Nov 2020 submission)
Select Relative survival: Go to the STATISTICS tab and select “Relative survival”
Selection of years of diagnosis 1975-2017 and cancer sites. Go to the SELECTION tab. In the CASE SELECTION box click EDIT. In the VARIABLE box go to:
1. “Race, Sex, Year Dx” folder. Click Year of diagnosis and select years 1975 through 2017
2. “Site and Morphology” folder. Click Site Recode ICD-O-3/WHO 2008. Select Non-Hodgkin Lymphoma and Chronic Myeloid Leukemias

Defining the intervals for calculations

Define 10 annual (or 12 month) intervals for calculation of relative survival and the output format
1. In the INTERVAL box in PARAMETERS tab, for “Month Per” select or type 12. For “Number” select 10.
2. In the DISPLAY Box check Standard Life.

Defining the intervals for calculations using the CANSURV/JPSurv Output format

1. Alternatively, for step 5 you can select Session in the main menu and select CANSURV/JPSurv output. This selection will automatically set the interval to 12 months. You still need to define 10 intervals (instead of the default of 5).

Creating the year of diagnosis and cancer site variables for stratification

Include the variables that will be used to stratify survival calculations. Go to the TABLE tab.
1. Create a new variable that will include single year of diagnosis from 1975 through 2017. Click on the “Race, Sex, Year Dx” folder. Click Year of diagnosis and then the CREATE button. Delete 1975-2018 and 2018 values and save it with a new name. Add this as a page, row, or column (it does not matter which)
2. Create a new variable that will include only 2 sites NHL and CML. Using the same process, click on the “Site and Morphology” folder. Click Site Recode ICD-O-3/WHO 2008 and then the CREATE button. Delete all values, select NHL and CML. Save it with a new name. Add this as a page, row, or column (it does not matter which)
Run SEER*Stat

Saving and Exporting

Save the results and export the matrix. Go to the Matrix in the main menu. Matrix -> Export -> Results as Text File

Step 2: Reading the data into JPSurv, specifying parameters, and running JPSurv

Importing data using Dic/Data files

Data sets generated in SEER*Stat must include year at diagnosis as a covariate. To reflect trends in calendar years, 12 months per interval needs to be specified. In addition, the “Display/Standard Life” option in the Parameters tab must be checked as JPSurv does not read the “Summary Table” format. Data can be exported directly from SEER*Stat and saved as a .txt file with a corresponding dictionary .dic file.

Open JPSurv by searching for “JPSurv” and “NCI” keywords or entering the URL https://analysistools.cancer.gov/jpsurv/
Confirm that Dic/Data Files is selected in the File Format section.
Click the Choose Files button and select the .txt and .dic files exported in Step 1. Note that you need to select both at the same time.
Press Upload Input Files button

Importing data using CSV files

JPSurv can also read data from delimited text files with common delimiters (comma, semicolon, or tab). Users will need to have some prior knowledge of the data stored in each column to correctly use JPSurv. The data requires at minimum: calendar year of diagnosis, survival time interval, number at risk at beginning of interval, number of deaths, number of cases lost to follow-up, and, in the case of relative survival, interval expected survival. Data may also include other covariates of interest, such as cancer site, sex, stage, etc. Note that the survival time interval should be at equal intervals and should not have any gaps. Note that when using a delimited text file as input to JPSurv, the user must convert the interval expected survival column to Proportions from Percentages as the acceptable range of inputs is [0,1].

Open JPSurv by searching for “JPSurv” and “NCI” keywords or entering the URL https://analysistools.cancer.gov/jpsurv/
Confirm that CSV Files is selected in the File Format section.
Click the Browse button and select the .csv or .txt file exported in Step 1.
Once the file is selected a new window will pop up for CSV Configuration
Choose the appropriate delimiter
If the file contains headers, make sure the box titled “Does the file contain headers?” is checked. Otherwise, uncheck the box.
Choose the appropriate data type, either Relative Survival or Cause-Specific Survival.
Choose the appropriate display option for rates, either Percents or Proportions.
You can change the desired number of rows to display
Choose the drop-down menu for each column and select the parameter corresponding to each column. Note that all required parameters must be mapped to the appropriate column.
Press Save.
Press Upload.

Importing data using JPSurv workspace

Open JPSurv by searching for “JPSurv” and “NCI” keywords or entering the URL https://analysistools.cancer.gov/jpsurv/
Confirm that Workspace is selected in the File Format section.
Click the Browse button and select the .jpsurv workspace file.
Press Import and the JPSurv session will load automatically.

Fit a model with 1 Joinpoint to NHL data

Fitting a simple joinpoint survival model to NHL data

Import NHL data using the appropriate option (Dic/Dat, CSV, or Workspace)
In the Year of Diagnosis Range box, specify the number of years that will be used to fit the JPSurv model. In this example, all years of diagnosis will be used.
In the Max No. of Year from Diagnosis (follow-up) to include box, the user can specify the maximum intervals from diagnosis to select a subset of the input data to be used in analysis. In this example, the maximum length of follow-up allowed will be used.
Select “Non-Hodgkin Lymphoma” as the cohort of interest
For “Maximum joinpoints,” select 1 from the dropdown menu
Click Calculate
Examine the results. You will see that the model probably needs more joinpoints

Fit models with max no. of Joinpoint = 4 to NHL and CML

Fitting complex joinpoint survival models to NHL and CML data

In the Cohort and Model Specifications box select both “Non-Hodgkin Lymphoma” and “Chronic Myeloid Leukemia.” In general, multiple cohorts can be selected at once, but this increases computation time, and an email address is required for a notification to be sent when results are available.
In “Maximum Joinpoints” select 4 from the dropdown menu.
Click the Calculate button.
Type your email
When you receive the e-mail, click “View Results”

Advanced Options

Delete Last Interval: The last interval can be deleted in case there is data instability in the last follow-up interval.
Minimum Number of Years between Joinpoints (Excluding Joinpoints): If x is selected, joinpoints will be x years apart. The default value is 2.
Minimum Number of Years before First Joinpoint (Excluding Joinpoint): If x is selected, the first joinpoint may be located at the (x+1)^th or later calendar year. The default value is 3.
Minimum Number of Years after Last Joinpoint (Excluding Joinpoint): If x is selected, the last joinpoint can be located at (x+1) or more calendar years prior to the last calendar year. The default value is 5.
Number of Calendar Years of Projected Survival: This specifies the calculation of projected survival up to x years from the last calendar year. The default value is 5.

Explore the results:

Users can export the cohort, model specification, and results, either to an Excel spreadsheet or a workspace file. Results can be exported to Excel via the “Download Full Dataset” option or to a workspace file via the “Export Workspace” option.

JPSurv uses the minimum Bayesian Information Criterion (BIC) to select the best fitted model. The Akaike Information Criterion (AIC) is also provided. AIC tends to select models with a higher number of joinpoints. Graphs and other output features are available for other fitted models beyond the final selected model.

Graphs display predicted (modeled) and observed survival or interval probabilities of death for each joinpoint model and cohorts. The default model displayed is the final selected model. However, the user can select other fitted models. The user can check “Show Trend Measures” and hit “Recalculate” to display the trend summary measures. All JPSurv plots are created using the ggplot2 R package [3].

Survival vs. Year of Diagnosis Graphs: Users can select one or more values of interval years and produce the trend graph over all available years of diagnosis. The default is 5-year survival.
- Trend measure- Average Absolute Change in Survival by Diagnosis Year: These numbers represent the average absolute difference in survival (either relative or cause-specific) for individuals diagnosed in one calendar year compared to the prior year. This measure depends on calendar year and the time since diagnosis as selected by the user. The average over calendar years is reported.
Death vs. Year of Diagnosis Graph: The user can select one or more values of death interval years and produce the trend graph over all available years of diagnosis. The default is 5-year probability of death interval, which represents, given being alive at the end of the fourth year, the probability of dying of cancer between the 4^th and 5^th year from diagnosis.
- Trend measure- Percent Change in the Interval Probability of Dying of Cancer by Diagnosis: These numbers represent the percent change in the interval probability of dying of cancer for those diagnosed in one calendar year compared to the prior year. Because the fitted model assumes proportional hazards, this trend measure is independent of time since diagnosis. Thus, it is the same for probabilities of dying in any interval since diagnosis.
Survival vs. Time Since Diagnosis Graph: Users can select one or more calendar years and display modeled vs. observed survival by years since diagnosis.

For certain graphs, users can display trend measures by checking the box “Show Trend Measures.” The annotation feature is only available when there are 3 or fewer intervals and models have 3 or fewer joinpoints. Model estimates are displayed in terms of the number and location of joinpoints, parameter estimates, and standard errors. Data displayed in graphs may be downloaded using the “Download Graph Dataset” option.

Explore the joinpoint survival model results

How many joinpoints does the final selected model for NHL have, and in which years are the joinpoints located?
1. Answer: The final selected model has 4 joinpoints located at 1983, 1994, 2002, and 2012
2. Note: Because the final model has 4 joinpoints, it would be advisable to test a model with 5 joinpoints
In which periods is survival increasing vs. decreasing? Click the “Include Trend Measures -> Between Joinpoints” options and then click “Recalculate.”
1. Answer: Survival is increasing for all periods except after 2012, the last identified joinpoint
Include 1-year in the graph with 5-year and 10-year cumulative survival. In “Select years since diagnosis (follow-up) for survival plot and/or trend measures” select x1. Click “Recalculate.”
Look at the Death vs. Year at Diagnosis graphs. Include the model and data for the 1-year probability of death
Produce a graph that shows cumulative survival by time since diagnosis for patients diagnosed in 1975 and another graph for patients diagnosed in 2009.

Save the data and results

You can save all the results and the data by selecting either “Export Workspace” or “Download Full Dataset.” You can retrieve the results by selecting and opening the workspace via the Workspace option under “File Format.”

References

Yu BB, Huang L, Tiwari RC, Feuer EJ, Johnson KA. Modelling population-based cancer survival trends by using join point models for grouped survival data. Journal of the Royal Statistical Society Series a-Statistics in Society. 2009;172:405-25
Hakulinen T, Tenkanen L. Regression Analysis of Relative Survival Rates. Applied Statistics. 1987;36(3):309-17.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.