The analysis of quality adjusted lifetime adds an interesting wrinkle to the field of dynamic treatment regimes (DTRs), in that the optimal regime will not only depend on patient information (including treatments taken, intermediate outcomes, and other patient covariates), but it will also depend on information on the treatments themselves, e.g. monetary cost or toxicity. The focus of this paper is to investigate a form of Q-learning using estimating equations for the quality adjusted survival outcome. We use m-out-of-n bootstrap for inference and threshold utility analysis to show how the patient-specific optimal regime varies according to the treatment characteristics (e.g. cost, side effects). Methodologies developed are investigated through a simulation study and are demonstrated to construct optimal treatment regimes for the treatment of children’s neuroblastoma.