Class CompressionUtils
java.lang.Object
htsjdk.samtools.cram.compression.CompressionUtils
Utility methods shared across CRAM 3.1 compression codecs (rANS, Range, Name Tokeniser, etc.),
including uint7 encoding, bit-packing, and STRIPE data transformation.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic ByteBufferallocateByteBuffer(int bufferSize) Allocate a new little-endian ByteBuffer of the specified size.static ByteBufferallocateOutputBuffer(int inSize) Allocate an output buffer large enough to hold compressed rANS data, including worst-case frequency table overhead and header bytes.static int[]buildStripeUncompressedSizes(int totalSize) Compute the uncompressed size for each stripe stream.static ByteBufferdecodePack(ByteBuffer inBuffer, byte[] packMappingTable, int numSymbols, int uncompressedPackOutputLength) Unpack bit-packed data back to one byte per symbol, reversing the transformation performed byencodePack(ByteBuffer, ByteBuffer, int[], int[], int).static ByteBufferencodePack(ByteBuffer inBuffer, ByteBuffer outBuffer, int[] frequencyTable, int[] packMappingTable, int numSymbols) Pack input symbols into a smaller number of bits per value based on the number of distinct symbols.static intstatic intreadUint7(byte[] buf, int[] posHolder) Read uint7 from byte[] at posHolder[0], advancing posHolder[0].static intreadUint7(ByteBuffer cp) Read an unsigned integer using 7-bit variable-length encoding (uint7).static ByteBufferslice(ByteBuffer inputBuffer) Create a little-endian slice of the given ByteBuffer (from position to limit).static ByteBuffer[]stripeTranspose(ByteBuffer inBuffer, int[] sizes) Transpose (de-interleave) input data into N=4 separate streams using round-robin byte distribution.static byte[]toByteArray(ByteBuffer buffer) Return a byte array with contents matching the ByteBuffer from position 0 to limit.static ByteBufferwrap(byte[] inputBytes) Wrap a byte array in a little-endian ByteBuffer.static voidwriteUint7(int i, byte[] buf, int[] posHolder) Write uint7 into byte[] at posHolder[0], advancing posHolder[0].static voidwriteUint7(int i, ByteBuffer cp) Write an unsigned integer using 7-bit variable-length encoding (uint7).
-
Constructor Details
-
CompressionUtils
public CompressionUtils()
-
-
Method Details
-
writeUint7
Write an unsigned integer using 7-bit variable-length encoding (uint7). Each output byte uses 7 bits for data and the high bit as a continuation flag (1 = more bytes follow).- Parameters:
i- the value to write (must be non-negative)cp- the output buffer
-
readUint7
Read an unsigned integer using 7-bit variable-length encoding (uint7). Each byte uses 7 bits for data and the high bit as a continuation flag (1 = more bytes follow).- Parameters:
cp- the input buffer- Returns:
- the decoded unsigned integer value
-
writeUint7
public static void writeUint7(int i, byte[] buf, int[] posHolder) Write uint7 into byte[] at posHolder[0], advancing posHolder[0]. -
readUint7
public static int readUint7(byte[] buf, int[] posHolder) Read uint7 from byte[] at posHolder[0], advancing posHolder[0]. -
encodePack
public static ByteBuffer encodePack(ByteBuffer inBuffer, ByteBuffer outBuffer, int[] frequencyTable, int[] packMappingTable, int numSymbols) Pack input symbols into a smaller number of bits per value based on the number of distinct symbols. Writes the pack header (symbol count, mapping table, packed length) to outBuffer and returns the packed data as a separate buffer.- Parameters:
inBuffer- the input data to packoutBuffer- the output buffer for the pack header (symbol count, mapping table, packed length)frequencyTable- frequency counts for each byte value (0-255)packMappingTable- mapping from original symbol to packed valuenumSymbols- the number of distinct symbols in the input- Returns:
- a ByteBuffer containing the packed data
-
decodePack
public static ByteBuffer decodePack(ByteBuffer inBuffer, byte[] packMappingTable, int numSymbols, int uncompressedPackOutputLength) Unpack bit-packed data back to one byte per symbol, reversing the transformation performed byencodePack(ByteBuffer, ByteBuffer, int[], int[], int).- Parameters:
inBuffer- the packed input datapackMappingTable- mapping from packed value back to original symbolnumSymbols- the number of distinct symbols (determines bits per value)uncompressedPackOutputLength- the expected number of output bytes- Returns:
- a ByteBuffer containing the unpacked data
-
allocateOutputBuffer
Allocate an output buffer large enough to hold compressed rANS data, including worst-case frequency table overhead and header bytes.- Parameters:
inSize- the uncompressed input size- Returns:
- a little-endian ByteBuffer sized for the worst-case compressed output
-
allocateByteBuffer
Allocate a new little-endian ByteBuffer of the specified size.- Parameters:
bufferSize- the capacity of the buffer- Returns:
- a new little-endian ByteBuffer
-
wrap
Wrap a byte array in a little-endian ByteBuffer.- Parameters:
inputBytes- the byte array to wrap- Returns:
- a little-endian ByteBuffer backed by the input array
-
slice
Create a little-endian slice of the given ByteBuffer (from position to limit).- Parameters:
inputBuffer- the buffer to slice- Returns:
- a new little-endian ByteBuffer sharing the input's content
-
buildStripeUncompressedSizes
public static int[] buildStripeUncompressedSizes(int totalSize) Compute the uncompressed size for each stripe stream. Earlier streams get the extra bytes when totalSize is not evenly divisible by the number of streams.- Parameters:
totalSize- the total uncompressed size- Returns:
- array of per-stream sizes
-
stripeTranspose
Transpose (de-interleave) input data into N=4 separate streams using round-robin byte distribution. Stream i gets bytes at positions i, i+4, i+8, ...- Parameters:
inBuffer- the input data (position to limit)sizes- per-stream uncompressed sizes frombuildStripeUncompressedSizes(int)- Returns:
- array of ByteBuffers, one per stream
-
getStripeNumStreams
public static int getStripeNumStreams()- Returns:
- the number of streams used by the STRIPE codec (always 4)
-
toByteArray
Return a byte array with contents matching the ByteBuffer from position 0 to limit. If the buffer is backed by an array that exactly matches its limit, returns the backing array directly (no copy). Otherwise copies the data into a new array.- Parameters:
buffer- the source ByteBuffer- Returns:
- a byte array containing the buffer's data
-