Class ByteModel

java.lang.Object
htsjdk.samtools.cram.compression.range.ByteModel

public class ByteModel extends Object
Adaptive frequency model for the CRAM 3.1 arithmetic range coder. Maintains per-symbol frequency counts and provides encode/decode operations that update the model after each symbol. Symbols are kept approximately sorted by frequency (descending) for cache-friendly access.

Symbols and frequencies are interleaved in a single int[] array for cache locality: even indices hold frequencies, odd indices hold symbol values. This eliminates the cache thrashing that occurs with separate arrays during the linear scan.

Each symbol starts with a frequency of 1. After encoding/decoding a symbol, its frequency is incremented by Constants.STEP (16). When total frequency exceeds Constants.MAX_FREQ, all frequencies are halved (avoiding zeros).

See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    final int[]
    Interleaved frequency/symbol pairs: data[i*2] = frequency, data[i*2+1] = symbol.
    final int
     
    int
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    ByteModel(int numSymbols)
    Create a new model for the given number of distinct symbols (0 to numSymbols-1), each starting with frequency 1.
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    modelDecode(ByteBuffer inBuffer, RangeCoder rangeCoder)
    Decode one symbol from the compressed stream, update the model frequencies, and return the symbol.
    void
    modelEncode(RangeCoder rangeCoder, int symbol)
    Encode one symbol to the compressed stream and update the model frequencies.
    void
    Halve all frequencies (avoiding zeros) when total frequency exceeds Constants.MAX_FREQ.
    void
    Reset all frequencies to 1 and restore natural symbol ordering.

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • totalFrequency

      public int totalFrequency
    • maxSymbol

      public final int maxSymbol
    • data

      public final int[] data
      Interleaved frequency/symbol pairs: data[i*2] = frequency, data[i*2+1] = symbol. Keeping these adjacent improves cache hit rate during the linear scan in encode/decode.
  • Constructor Details

    • ByteModel

      public ByteModel(int numSymbols)
      Create a new model for the given number of distinct symbols (0 to numSymbols-1), each starting with frequency 1.
      Parameters:
      numSymbols - number of distinct symbols this model can encode/decode
  • Method Details

    • reset

      public void reset()
      Reset all frequencies to 1 and restore natural symbol ordering.
    • modelDecode

      public int modelDecode(ByteBuffer inBuffer, RangeCoder rangeCoder)
      Decode one symbol from the compressed stream, update the model frequencies, and return the symbol.
      Parameters:
      inBuffer - the compressed input stream
      rangeCoder - the range coder state (must have been started with RangeCoder.rangeDecodeStart(ByteBuffer))
      Returns:
      the decoded symbol value
    • modelRenormalize

      public void modelRenormalize()
      Halve all frequencies (avoiding zeros) when total frequency exceeds Constants.MAX_FREQ.
    • modelEncode

      public void modelEncode(RangeCoder rangeCoder, int symbol)
      Encode one symbol to the compressed stream and update the model frequencies. Output is written to the range coder's internal byte[] buffer.
      Parameters:
      rangeCoder - the range coder state (must have output set via RangeCoder.setOutput(byte[], int))
      symbol - the symbol value to encode (must be in range 0 to maxSymbol)