We present the efficiency of Computerized Speech Recognition (ASR) techniques that use semi-supervised speech representations could be boosted by a complimentary pitch accent detection module, by introducing a joint ASR and pitch accent detection mannequin. The pitch accent detection part of our mannequin achieves a big enchancment on the state-of-the-art for the duty, closing the hole in F1-score by 41%. Moreover, the ASR efficiency in joint coaching decreases WER by 28.3% on LibriSpeech, underneath restricted useful resource fine-tuning. With these outcomes, we present the significance of extending pretrained speech fashions to retain or re-learn vital prosodic cues similar to pitch accent.