|
Mila 0.13.48
Deep Neural Network Library
|
Statistics captured during a single generateStreaming() call. More...
Public Member Functions | |
| bool | valid () const noexcept |
| Returns true when at least one generation run has been recorded. | |
Public Attributes | |
| float | decode_time_ms { 0.0f } |
| Total time spent in the autoregressive decode loop (ms); 0 when only one token was generated. | |
| float | decode_tokens_per_second { 0.0f } |
| Decode throughput in tokens per second; 0 when decode loop produced no tokens. | |
| float | prefill_time_ms { 0.0f } |
| Time to first token: prefill forward pass + synchronization + first token sampling (ms). | |
| std::size_t | prompt_tokens { 0 } |
| Number of input prompt tokens processed during prefill. | |
| std::size_t | tokens_generated { 0 } |
| Total tokens generated including the first token produced by prefill. | |
Statistics captured during a single generateStreaming() call.
Populated by the derived model's onGenerating() implementation after each generation run. Retrieve via getLastGenerationStatistics() once generateStreaming() returns.
|
inlinenodiscardnoexcept |
Returns true when at least one generation run has been recorded.
| float Mila::Dnn::GenerationStatistics::decode_time_ms { 0.0f } |
Total time spent in the autoregressive decode loop (ms); 0 when only one token was generated.
| float Mila::Dnn::GenerationStatistics::decode_tokens_per_second { 0.0f } |
Decode throughput in tokens per second; 0 when decode loop produced no tokens.
| float Mila::Dnn::GenerationStatistics::prefill_time_ms { 0.0f } |
Time to first token: prefill forward pass + synchronization + first token sampling (ms).
| std::size_t Mila::Dnn::GenerationStatistics::prompt_tokens { 0 } |
Number of input prompt tokens processed during prefill.
| std::size_t Mila::Dnn::GenerationStatistics::tokens_generated { 0 } |
Total tokens generated including the first token produced by prefill.