This paper presents an environment friendly decoding method for end-to-end computerized speech recognition (E2E-ASR) with giant language fashions (LLMs). Though shallow fusion is the commonest method to include language fashions into E2E-ASR decoding, we face two sensible issues with LLMs. (1) LLM inference is computationally expensive. (2) There could also be a vocabulary mismatch between the ASR mannequin and the LLM. To resolve this mismatch, we have to retrain the ASR mannequin and/or the LLM, which is at greatest time-consuming and in lots of circumstances not possible. We suggest “delayed fusion,” which applies LLM scores to ASR hypotheses with a delay throughout decoding and permits simpler use of pre-trained LLMs in ASR duties. This methodology can scale back not solely the variety of hypotheses scored by the LLM but in addition the variety of LLM inference calls. It additionally permits re-tokenizion of ASR hypotheses throughout decoding if ASR and LLM make use of completely different tokenizations. We reveal that delayed fusion offers improved decoding pace and accuracy in comparison with shallow fusion and N-best rescoring utilizing the LibriHeavy ASR corpus and three public LLMs, OpenLLaMA 3B & 7B and Mistral 7B.