JPSurv software was developed to analyze trends in survival with respect to
the year of diagnosis [1]. Survival data includes two time scales that must
be taken into consideration: the calendar year of diagnosis and the time
since diagnosis. The JPSurv model is an extension of the Cox proportional
hazards model and of Hakulinen and Tenkanen [2] for relative survival.
JPSurv fits a proportional hazard joinpoint model to survival data on the
log hazard scale. Jointpoint models consist of linear segments connected
through “joinpoints,” which represent the times at which trends
changed. The hazard of cancer death is specified as the product of a
baseline hazard, on the time since diagnosis scale, and a multiplicative
factor describing the effect of year of diagnosis and other covariates. The
year of diagnosis effect is modeled as joined linear segments on the log
scale. The number and location of joinpoints are estimated from data. The
model implies that the probability (hazard) of cancer death as a function of
time since diagnosis is proportional for subjects diagnosed in different
calendar years. The JPSurv software uses discrete-time survival data, i.e.,
data grouped by years since diagnosis in the life table format. The software
can accommodate both relative survival and cause-specific survival.
The purpose of this tutorial is to provide a step-by-step illustration on
how to utilize JPSurv to analyze changes in survival trends.
Objective
The goal is to estimate relative survival trends from 1975 for patients (any
sex, any stage) diagnosed with Non-Hodgkin Lymphoma (NHL) or Chronic Myeloid
Leukemias (CML). Note that we usually don’t select the last diagnosis year
(2018 in our case) because the last year may miss some of the deaths and
survival may be overestimated.
Step 1: Extract the data from SEER*Stat
JPSurv can read in grouped relative or cause-specific survival data
generated from SEER*Stat. The minimum variables required are: calendar year
of diagnosis, survival time interval (i.e., years from diagnosis), number at
risk at beginning of interval, number of deaths, number of cases lost to
follow-up, and, in the case of relative survival, interval expected
survival. Data may also include other covariates of interest, such as cancer
site, sex, stage, etc.
Opening the survival session, selecting the database, and selecting the
cases
Open the survival session
Select database: Select the SEER*Stat database (SEER Research Data,
9 registries (1975-2018) Nov 2020 submission)
Select Relative survival: Go to the STATISTICS tab and select
“Relative survival”
Selection of years of diagnosis 1975-2017 and cancer sites. Go to
the SELECTION tab. In the CASE SELECTION box click EDIT. In the VARIABLE
box go to:
“Race, Sex, Year Dx” folder. Click Year of diagnosis and
select years 1975 through 2017
“Site and Morphology” folder. Click Site Recode
ICD-O-3/WHO 2008. Select Non-Hodgkin Lymphoma and Chronic Myeloid
Leukemias
Defining the intervals for calculations
Define 10 annual (or 12 month) intervals for calculation of relative
survival and the output format
In the INTERVAL box in PARAMETERS tab, for “Month Per”
select or type 12. For “Number” select 10.
In the DISPLAY Box check Standard Life.
Defining the intervals for calculations using the CANSURV/JPSurv Output
format
Alternatively, for step 5 you can select Session in the main menu and
select CANSURV/JPSurv output. This selection will automatically set
the interval to 12 months. You still need to define 10 intervals
(instead of the default of 5).
Creating the year of diagnosis and cancer site variables for
stratification
Include the variables that will be used to stratify survival
calculations.
Go to the TABLE tab.
Create a new variable that will include single year of diagnosis
from 1975 through 2017.
Click on the “Race, Sex, Year Dx” folder. Click Year of
diagnosis and then the CREATE button. Delete 1975-2018 and 2018 values
and save it with a new name. Add this as a page, row, or column (it
does not matter which)
Create a new variable that will include only 2 sites NHL and
CML.
Using the same process, click on the “Site and Morphology”
folder. Click Site Recode ICD-O-3/WHO 2008 and then the CREATE button.
Delete all values, select NHL and CML. Save it with a new name. Add
this as a page, row, or column (it does not matter which)
Run SEER*Stat
Saving and Exporting
Save the results and export the matrix. Go to the Matrix in the main
menu. Matrix -> Export -> Results as Text File
Step 2: Reading the data into JPSurv, specifying parameters, and running
JPSurv
Importing data using Dic/Data files
Data sets generated in SEER*Stat must include year at diagnosis as a
covariate. To reflect trends in calendar years, 12 months per interval needs
to be specified. In addition, the “Display/Standard Life” option
in the Parameters tab must be checked as JPSurv does not read the
“Summary Table” format. Data can be exported directly from
SEER*Stat and saved as a .txt file with a corresponding dictionary .dic
file.
Confirm that Dic/Data Files is selected in the File Format section.
Click the Choose Files button and select the .txt and .dic files exported
in Step 1. Note that you need to select both at the same time.
Press Upload Input Files button
Importing data using CSV files
JPSurv can also read data from delimited text files with common delimiters
(comma, semicolon, or tab). Users will need to have some prior knowledge of
the data stored in each column to correctly use JPSurv. The data requires at
minimum: calendar year of diagnosis, survival time interval, number at risk
at beginning of interval, number of deaths, number of cases lost to
follow-up, and, in the case of relative survival, interval expected
survival. Data may also include other covariates of interest, such as cancer
site, sex, stage, etc. Note that the survival time interval should be at
equal intervals and should not have any gaps. Note that when using a
delimited text file as input to JPSurv, the user must convert the interval
expected survival column to Proportions from Percentages as the acceptable
range of inputs is [0,1].
Confirm that CSV Files is selected in the File Format section.
Click the Browse button and select the .csv or .txt file exported in Step
1.
Once the file is selected a new window will pop up for CSV Configuration
Choose the appropriate delimiter
If the file contains headers, make sure the box titled “Does the
file contain headers?” is checked. Otherwise, uncheck the box.
Choose the appropriate data type, either Relative Survival or
Cause-Specific Survival.
Choose the appropriate display option for rates, either Percents or
Proportions.
You can change the desired number of rows to display
Choose the drop-down menu for each column and select the parameter
corresponding to each column. Note that all required parameters must be
mapped to the appropriate column.
Confirm that Workspace is selected in the File Format section.
Click the Browse button and select the .jpsurv workspace file.
Press Import and the JPSurv session will load automatically.
Fit a model with 1 Joinpoint to NHL data
Fitting a simple joinpoint survival model to NHL data
Import NHL data using the appropriate option (Dic/Dat, CSV, or Workspace)
In the Year of Diagnosis Range box, specify the number of years that will
be used to fit the JPSurv model. In this example, all years of diagnosis
will be used.
In the Max No. of Year from Diagnosis (follow-up) to include box, the user
can specify the maximum intervals from diagnosis to select a subset of the
input data to be used in analysis. In this example, the maximum length of
follow-up allowed will be used.
Select “Non-Hodgkin Lymphoma” as the cohort of interest
For “Maximum joinpoints,” select 1 from the dropdown menu
Click Calculate
Examine the results. You will see that the model probably needs more
joinpoints
Fit models with max no. of Joinpoint = 4 to NHL and CML
Fitting complex joinpoint survival models to NHL and CML data
In the Cohort and Model Specifications box select both “Non-Hodgkin
Lymphoma” and “Chronic Myeloid Leukemia.” In general,
multiple cohorts can be selected at once, but this increases computation
time, and an email address is required for a notification to be sent when
results are available.
In “Maximum Joinpoints” select 4 from the dropdown menu.
Click the Calculate button.
Type your email
When you receive the e-mail, click “View Results”
Advanced Options
Delete Last Interval: The last interval can be deleted in case
there is data instability in the last follow-up interval.
Minimum Number of Years between Joinpoints (Excluding Joinpoints):
If x is selected, joinpoints will be x years apart. The default value is
2.
Minimum Number of Years before First Joinpoint (Excluding
Joinpoint):
If x is selected, the first joinpoint may be located at the (x+1)^th or
later calendar year. The default value is 3.
Minimum Number of Years after Last Joinpoint (Excluding Joinpoint):
If x is selected, the last joinpoint can be located at (x+1) or more
calendar years prior to the last calendar year. The default value is 5.
Number of Calendar Years of Projected Survival:
This specifies the calculation of projected survival up to x years from
the last calendar year. The default value is 5.
Explore the results:
Users can export the cohort, model specification, and results, either to an
Excel spreadsheet or a workspace file. Results can be exported to Excel via
the “Download Full Dataset” option or to a workspace file via
the “Export Workspace” option.
JPSurv uses the minimum Bayesian Information Criterion (BIC) to select the
best fitted model. The Akaike Information Criterion (AIC) is also provided.
AIC tends to select models with a higher number of joinpoints. Graphs and
other output features are available for other fitted models beyond the final
selected model.
Graphs display predicted (modeled) and observed survival or interval
probabilities of death for each joinpoint model and cohorts. The default
model displayed is the final selected model. However, the user can select
other fitted models. The user can check “Show Trend Measures”
and hit “Recalculate” to display the trend summary measures. All
JPSurv plots are created using the ggplot2 R package [3].
Survival vs. Year of Diagnosis Graphs: Users can select one or more
values of interval years and produce the trend graph over all available
years of diagnosis. The default is 5-year survival.
Trend measure- Average Absolute Change in Survival by Diagnosis
Year: These numbers represent the average absolute difference in survival
(either relative or cause-specific) for individuals diagnosed in one
calendar year compared to the prior year. This measure depends on
calendar year and the time since diagnosis as selected by the user.
The average over calendar years is reported.
Death vs. Year of Diagnosis Graph: The user can select one or more
values of death interval years and produce the trend graph over all
available years of diagnosis. The default is 5-year probability of death
interval, which represents, given being alive at the end of the fourth
year, the probability of dying of cancer between the 4th and
5th year from diagnosis.
Trend measure- Percent Change in the Interval Probability of Dying
of Cancer by Diagnosis:
These numbers represent the percent change in the interval probability
of dying of cancer for those diagnosed in one calendar year compared
to the prior year. Because the fitted model assumes proportional
hazards, this trend measure is independent of time since diagnosis.
Thus, it is the same for probabilities of dying in any interval since
diagnosis.
Survival vs. Time Since Diagnosis Graph: Users can select one or
more calendar years and display modeled vs. observed survival by years
since diagnosis.
For certain graphs, users can display trend measures by checking the box
“Show Trend Measures.” The annotation feature is only available
when there are 3 or fewer intervals and models have 3 or fewer joinpoints.
Model estimates are displayed in terms of the number and location of
joinpoints, parameter estimates, and standard errors. Data displayed in
graphs may be downloaded using the “Download Graph Dataset”
option.
Explore the joinpoint survival model results
How many joinpoints does the final selected model for NHL have, and in
which years are the joinpoints located?
Answer: The final selected model has 4 joinpoints located at 1983,
1994, 2002, and 2012
Note: Because the final model has 4 joinpoints, it would be advisable
to test a model with 5 joinpoints
In which periods is survival increasing vs. decreasing? Click the
“Include Trend Measures -> Between Joinpoints” options and
then click “Recalculate.”
Answer: Survival is increasing for all periods except after 2012, the
last identified joinpoint
Include 1-year in the graph with 5-year and 10-year cumulative survival.
In “Select years since diagnosis (follow-up) for survival plot
and/or trend measures” select x1. Click
“Recalculate.”
Look at the Death vs. Year at Diagnosis graphs. Include the model and data
for the 1-year probability of death
Produce a graph that shows cumulative survival by time since diagnosis for
patients diagnosed in 1975 and another graph for patients diagnosed in
2009.
Save the data and results
You can save all the results and the data by selecting either “Export
Workspace” or “Download Full Dataset.” You can retrieve
the results by selecting and opening the workspace via the Workspace option
under “File Format.”
References
Yu BB, Huang L, Tiwari RC, Feuer EJ, Johnson KA. Modelling
population-based cancer survival trends by using join point models for
grouped survival data. Journal of the Royal Statistical Society Series
a-Statistics in Society. 2009;172:405-25
Hakulinen T, Tenkanen L. Regression Analysis of Relative Survival Rates.
Applied Statistics. 1987;36(3):309-17.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag
New York, 2009.