NEWS

Rduckhts 1.3.0.9000-0.1.0

update the bundled DuckDB C API headers to DuckDB v1.5.3 while keeping stable extension ABI metadata at v1.2.0; bundled SQL sessions loaded through rduckhts_load() now expose DuckDB runtime type-support probes for VARIANT and GEOMETRY
simplify the bundled rduckhts_bcftools_norm() / duckhts_bcftools_norm(...) site-preserving table-macro query shape by removing the extra correlated scalar LATERAL subquery around bcftools_norm_row(...), eliminating the site-preserving LEFT_DELIM_JOIN plan overhead while preserving split-mode ALT row semantics and caller columns whose names collide with DuckHTS helper-column names used internally by earlier macro forms; add tinytest coverage for DuckDB's suffixed behavior when callers already have normalized-output column names
expose bundled reader scan_mode = "auto"|"sequential" controls through the R wrappers and multi-file helpers for read_bcf, read_bam, read_fasta, read_fastq, read_bed, read_gff, read_gtf, and read_tabix, so callers can force full-file streaming/counting instead of index-backed count or parallel scan paths where applicable; sequential mode is rejected for region queries
optimize bundled bcftools_norm_row(...) / rduckhts_bcftools_norm() for already-normalized plain ACGTN allele rows by skipping kstring left-realignment setup when trim predicates prove the row is unchanged; avoid per-row FASTA path duplication after the vector-local cache is established, reuse larger bounded per-thread reference windows, and document/defensively serialize htslib FASTA fetches while keeping normalization reference caches thread-local to avoid the faidx cache race class fixed in https://github.com/RGenomicsETL/duckhts/issues/17 / https://github.com/RGenomicsETL/duckhts/pull/18
make bundled rduckhts_bcftools_norm() / duckhts_bcftools_norm(...) gVCF-aware for vt/vcfnorm-style row normalization: <NON_REF> and <*> reference-block alleles now pass through with GVCFReferenceBlock, and mixed real-plus-gVCF-symbolic alleles normalize the real alleles while preserving symbolic alleles and caller-supplied reference-block END in site-preserving output; mixed * plus real alleles now follows the same ignored-symbolic path, while *-only rows remain SpanningDeletion; bundled phased GT/PL/GP/DS/PS FORMAT fixtures, including haploid/triploid/tetraploid Number=G cardinality cases, and tinytests pin phase-separator preservation through read_bcf(...)
add thin DBI wrappers rduckhts_bcf_convert_parquet(), rduckhts_bam_convert_parquet(), rduckhts_gff_convert_parquet(), and rduckhts_tabix_convert_parquet() around the bundled extension SQL builders duckhts_*_convert_parquet_sql(...); these convert DuckHTS scans to Parquet with DuckHTS write-format metadata, preserved raw headers, optional corrected header text, SQL-filter provenance, selected-column/partition metadata, arbitrary user metadata via R named lists/extension metadata := map(...), optional caller-managed JSON-file metadata when DuckDB's json extension is available, and partitioned-output support for DuckLake-style registration of premade Parquet files
include the final VCF #CHROM/sample header line in bundled read_hts_header(..., mode := 'raw'), so Parquet metadata written from VCF/BCF inputs has the complete header needed for future VCF/BCF regeneration
start the post-1.3.0 development cycle for the bundled duckhts extension

Rduckhts 1.3.0-0.1.0 (2026-05-29)

expose the rebuilt capability-mask SIMD dispatch diagnostics through rduckhts_simd_kernel_info(), keeping R wrappers thin while reporting one row per logical kernel and preserving backend-agnostic SQL/R conformance tests for seq_gc_content(...)
harden bundled SIMD backend helpers: retain extension-owned backend-name validation while restoring R-side scalar/non-missing argument shape checks, clarify selectable versus available diagnostics in generated docs, and preserve ASCII SQL quotes for the rendered duckhts_simd_set_backend('auto'|'scalar'|backend) catalog call
remove htslib autoconf HAVE_* macro guards from all bundled SIMD backend translation units; compile-time gate is now defined(__x86_64__) && (defined(__GNUC__) || defined(__clang__)) for x86 backends, available without autoconf; runtime dispatch and scalar fallback behavior are unchanged; add scalar-vs-auto backend R correctness tests for rduckhts_simd_set_backend() / seq_gc_content(...) covering GC=0/0.5/1.0, embedded-N calling, and soft-masked lowercase bases
drop internal .validate_simd_backend() R helper from SIMD wrapper functions; backend-name normalization and validation now belong to the extension, while the R wrappers only enforce that backend is a single non-missing character string

Rduckhts 1.2.1-0.1.0 (2026-05-07)

expose bundled SIMD diagnostics and explicit backend selection through SQL functions and R helpers rduckhts_simd_backend(), rduckhts_simd_requested_backend(), rduckhts_simd_backend_available(), and rduckhts_simd_set_backend(), route bundled seq_gc_content(...) through the new eager scalar/optional-AVX2 runtime dispatch scaffold while preserving scalar fallback behavior on ARM, wasm, and scalar-only builds, add runtime-gated AVX-512, ARM NEON, and wasm SIMD128 backend translation units where compiler-supported, keep the manual duckhts_build() rebuild path wired to the SIMD sources, and add README examples for the scalar/auto SIMD flow
fix bundled rduckhts_liftover() / bcftools_liftover(...) FASTA contig alias handling during source/destination reference validation and sequence fetches, and align bundled spanning-deletion * allele handling with upstream bcftools +liftover: inputs such as 23, 24, 26, X, Y, MT, and chr* aliases now resolve through the same canonical path, avoiding spurious SourceRefMismatch rejects for X/Y/MT indels when the bundled chain names and FASTA names differ only by canonical aliasing; bundled *-allele rows now follow upstream swap/ref-add semantics instead of taking the symbolic short-circuit path, full-file GIAB conformance against installed bcftools +liftover is now exact, and bundled SQL/tinytest coverage now pins the 23 -> chrX, SWAP=2, and SWAP=-1 regressions
bundle the official VariantKey / RegionKey C API (Nicola Asuni, 2018; https://doi.org/10.1101/473744) and expose new SQL helpers through rduckhts_load() sessions for both the bcftools-style and raw upstream numeric surfaces: variantkey(...) now matches bcftools %VKX / +add-variantkey on 1-based VCF rows, large/ambiguous/symbolic alleles keep the official hashed nonreversible mode, regionkey(...) adds 0-based half-open span keys plus overlap helpers, bundled tinytests pin reversible and hashed cases, and the package README now includes concrete DBI examples for VariantKey / RegionKey usage
fix bundled rduckhts_bcftools_norm(..., split_multiallelic = TRUE) row preservation for ref-only and empty-ALT inputs: rows with ALT='.', NULL ALT values, empty ALT lists, or NULL ALT list elements no longer disappear from split-mode DBI results, bundled tinytest coverage now pins the expected RefOnly / NullInput statuses and alt_index behavior, bundled rduckhts_bcf() / rduckhts_bcf_multi() now expose decompression_threads = 0 for explicit htslib worker-thread control on bgzipped VCF/BCF reads, and the package README now includes a concrete normalization example
fix bundled helper-return metadata for omitted output paths: rduckhts_fasta_index() now returns the generated .fai path when index_path = NULL instead of an empty string, bundled regression coverage now also pins default-path returns for BGZF compression/decompression and BAM/BCF/tabix index builders, and the rduckhts_bgzip() / rduckhts_bgunzip() wrappers now correctly propagate keep = FALSE instead of silently falling back to the extension default keep := TRUE
add rduckhts_bcftools_norm() and bundle bcftools_norm_row(...) / duckhts_bcftools_norm(...) for bcftools/vt-style FASTA-backed variant normalization from DBI queries: ALT inputs may be either comma-delimited VARCHAR or VARCHAR[], the bundled result appends pos_normed, end_pos_normed, ref_normed, alt_normed, normed, and norm_status, split mode emits one row per ALT with alt_index, and bundled SQL/tinytest coverage now exercises sequence, multiallelic, symbolic <DEL>/<DUP>, and missing-contig rows
fix bundled rduckhts_liftover() / bcftools_liftover(...) indel parity in two exact upstream rewrite points: repeat-run source extension now keeps extending across the cached source-reference window boundary when needed, and the bundled clip-pad Needleman-Wunsch path now keeps the best shift even when candidate alignment scores are negative instead of leaving padded intervals unshifted; bundled SQL/tinytest coverage now includes dedicated repeat-run and clip-pad regression fixtures, and the real-data conformance workflow reaches exact parity with installed bcftools +liftover on GIAB HG001 chr20 plus the full HG006 GRCh37 benchmark VCF
fix bundled rduckhts_liftover() / bcftools_liftover(...) row rejection for invalid source-reference indel and difficult-SNP inputs: rows that fail the source-FASTA validation path now stay in the result with mapped = FALSE and reject_reason = 'SourceRefMismatch' instead of fabricating padded lifted alleles or aborting the query; bundled tests and README examples now reflect the reject-row behavior
add rduckhts_pileup() and bundle native read_pileup(...) for region-scoped BAM pileups with per-position chrom, pos, depth, bases, and quals; expose bundled read_bam(..., cigar_representation := 'binary') through rduckhts_bam(..., cigar_representation = "binary") and multi-file BAM wrappers, returning packed BAM CIGAR ops as UINTEGER[]; and expose explicit gzi_path arguments in rduckhts_fasta(), rduckhts_fasta_multi(), and rduckhts_fasta_nuc() so packaged bgzipped FASTA workflows can use relocated .gzi sidecars
speed up bundled rduckhts_fasta_nuc() / fasta_nuc(...) nucleotide counting on capable x86_64 hosts with an AVX2+popcnt fast path selected via htslib-style runtime dispatch, while preserving the scalar fallback everywhere else
improve bundled remote HTS performance for long-running scans and rduckhts_bam_index(): native remote BAM/BCF/tabix/FASTA/BED reads now apply htslib block/cache tuning by access pattern, while wasm/browser builds use the same policy with smaller budgets appropriate for the XHR-backed worker runtime; the bundled vendored htslib also now exposes a pre-opened sam_index_build4(...) entry point so bam_index(...) can be tuned before remote index construction begins
fix bundled rduckhts_bcf() / read_bcf(...) scanning stability for records where FILTER lists were emitted without reserving list-vector capacity, which could crash with allocator corruption (double free/invalid pointer) during full-table reads; FILTER entries now reserve child-list space before writes and scans are stable on files previously triggering crashes
compile bundled DuckHTS extension sources with -Wpedantic during Unix and Windows package builds while leaving vendored htslib on its upstream warning flags
fix the bundled non-Emscripten wasm_http_hfile.c translation unit so native package builds do not warn about an empty source file under pedantic C diagnostics
harden Windows configure.win libcurl detection: the package now requires a successful curl_easy_init link using the detected pkg-config libcurl dependency closure before enabling htslib remote URL support, and otherwise disables libcurl/S3/GCS cleanly

Rduckhts 1.2.0-0.1.0

expose richer bundled GFF/GTF parsed attribute outputs through rduckhts_gff() / rduckhts_gtf() and multi-file wrappers: attributes_list = TRUE returns MAP(VARCHAR, VARCHAR[]) with grouped multi-values and GFF3 percent-decoding, while attributes_pairs = TRUE returns LIST<STRUCT(key VARCHAR, value VARCHAR, idx INTEGER)> for exact key/value/index records; attributes_map = TRUE remains the backward-compatible raw scalar map
expose bundled read_gff(..., strict := true) through rduckhts_gff(strict = TRUE) and rduckhts_gff_multi(strict = TRUE), enabling GFF3 structural validation from R/DBI workflows, including wrong field counts and malformed attribute segments, while keeping the default GFF reader permissive for existing ingestion pipelines
extend bundled rduckhts_score() / bcftools_score(...) so summary_path can be a character vector or callers can use summaries_list_file; multiple TSV/SSF summaries are scored in one genotype scan, log_path can write per-PRS matching/audit counts for loaded, matched, allele-mismatch, and duplicate markers, summaries_list_file directory scans are deterministic and ignore index sidecars, generated score/count column names are validated for uniqueness, and score accumulation now follows upstream bcftools +score float32 summation more closely
collapse the generated package README function catalog behind a disclosure widget so package users can jump to quick-start and workflow examples more easily
refresh the package README release docs: clarify the bundled htslib 1.23.1/system-requirements wording and redact transient temp-file paths in rendered example output so regenerated README diffs stay deterministic
add bundled duckhts_cgranges_overlaps_list(...), a vectorized scalar overlap expander that returns LIST-of-STRUCT hit records so DBI queries can expand provider rows with UNNEST(...) without generated bulk-probe SQL; package tests cover one-row-per-hit expansion over regular tables and bundled BED data, and the existing duckhts_cgranges_overlaps_bulk(...) probe path now also handles DuckDB string vector lengths safely
fix bundled duckhts_cgranges_from_query(...) ingestion of DuckDB string vectors by respecting string lengths instead of assuming NUL-terminated buffers; this fixes cgranges construction from providers such as read_bed(...) with longer chromosome names and adds package regression coverage
add bundled vectorized scalar cgranges probe helpers duckhts_cgranges_has_overlap(...) and duckhts_cgranges_count_overlaps(...), enabling DBI queries to stream provider rows through an already-finalized session cgranges index for filtering/count annotations without the materializing overlaps_bulk query-string path; add package-level coverage for overlap, contain, and NULL probe semantics
add bundled duckhts_cgranges_overlaps_bulk(...) for SQL-first bulk cgranges probing from R/DBI sessions: one table-function call now streams a query of probe intervals through a finalized cgranges index, supports mode = 'overlap'|'contain', accepts an optional query_row_id_col, and otherwise emits 1-based probe ordinals as query_row_id; add package-level regression coverage for the new bulk path
document bundled duckhts_cgranges_* entry points in the generated function catalog and package README, add bundled DBI smoke coverage for the session-scoped cgranges registry API, and include a packaged overlap-conformance script reference for bedtk-style parity checks
fix bundled rduckhts_fasta_nuc() / fasta_nuc(...) GC and AT percentages for intervals containing N: pct_gc and pct_at now use only informative A/C/G/T bases in the denominator, so ambiguous bases no longer depress reported bin/interval composition percentages; add bundled regression coverage
add bundled C-built cgranges bulk-ingest support via duckhts_cgranges_from_query(...), which runs the source query on an extension-owned DuckDB connection and builds the cgranges index in C before publishing it to the session registry; duckhts_cgranges_from_table(...) remains deferred for now
bundle htslib 1.23.1 in the package for the upstream CRAM decoder and GZI validation security fixes, including the wasm/browser-exposed parsing path shipped through Rduckhts
add rduckhts_bam_bed_coverage(), bundling native duckhts_bam_bed_coverage(...) for samtools coverage-like regional summaries over BED targets with DuckHTS-specific pre/post-filter columns and read-mode strand-specific post summaries; bundled SQL/tinytest coverage now checks expected outputs on the packaged mixed BAM fixture, and fragment_mode / processing_threads are exposed but currently reserved for later phases
reduce bundled rduckhts_bam_bed_coverage() / duckhts_bam_bed_coverage(...) peak memory by allocating and freeing per-region working depth buffers during scan processing instead of retaining them for the whole BED, tile large target intervals internally when computing covered-base breadth, keep the tiled implementation single-pass, align min_depth > 1 mean-depth behavior with samtools coverage, and expose decompression_threads so package callers can set htslib BAM/CRAM decode worker counts explicitly
add rduckhts_samtools_idxstats(), bundling native duckhts_samtools_idxstats(...) for samtools idxstats-compatible BAM/CRAM/SAM summaries with indexed BAM fast-paths and scan fallback; package SQL/tinytest coverage now checks BAM fast-path output, CRAM fallback output, explicit index_path, and overwrite errors
improve package-source hygiene for local development: ignore generated README.html, .Rcheck, staged duckhts_extension/htslib build outputs, wasm/webR harness byproducts, and stray root-level index files under r/Rduckhts/; add top-level make clean_local to purge the reproducible package-side artifacts
add processing_threads parameter to rduckhts_mosdepth() and bundled duckhts_mosdepth(...) for parallel contig processing: workers claim contigs atomically and write output in header order; on the NA12878 WGS benchmark with 2 processing threads, fast mode is 1.38x faster, default mode 1.40x faster, and fragment mode 1.61x faster than mosdepth v0.3.13, all byte-identical; new default is processing_threads = 2
change rduckhts_mosdepth() defaults to threads = 2 (decompression) and processing_threads = 2 (parallel contigs) for better out-of-the-box WGS performance
ship htslib public headers and static library in the installed package under duckhts_extension/htslib/{include,lib}/; add inst/htslib_config.R (generated from htslib_config.R.in at configure time) providing htslib_cflags(), htslib_libs(), htslib_rpath(), and htslib_version() for downstream R packages that link against the bundled htslib
fix configure.win to stage htslib headers into include/htslib/ alongside lib/, matching Unix configure
change bundled bam_bin_counts(...) / rduckhts_bam_bin_counts() to return a dense fixed-bin layout across each selected contig span, including zero-count bins up to the contig end instead of only observed bins; this gives downstream CNV/sample serializers stable per-contig bin shapes, and the package docs/tests now describe and validate the dense contract
add rduckhts_bam_bin_counts() and bundle native bam_bin_counts(...) fixed-width BAM/CRAM binning in the package. The new wrapper exposes mapq, require_flags, exclude_flags, and rmdup = "none"|"flag"|"streaming" duplicate handling, always returns per-bin forward/reverse totals, and can add per-bin GC/MAPQ summaries via stats = "gc", "mq", or "gc,mq"; bundled extdata now includes the tiny WisecondorX BAM/CRAM fixtures used by the new SQL/R tests, and the package README now includes a native bin-count example
add rduckhts_mosdepth() examples to the package README, including windowed fragment coverage output and preview of the generated BED.gz regions file, and refresh the generated function-catalog text so the packaged mosdepth description matches the current v0.3.13 parity surface
Expand rduckhts_mosdepth() and bundled duckhts_mosdepth(...) to cover the pinned local mosdepth 0.3.13 option surface for indexed BAM/CRAM input: fragment_mode = TRUE now matches upstream --fragment-mode full-fragment insert coverage for proper pairs, default mode is supported with CIGAR-aware coverage plus mate-overlap correction, read_groups = "..." filters RG tags, min_frag_len / max_frag_len filter absolute template length, and use_median = TRUE switches by = "<window|bed>" outputs from mean to median; add bundled SQL/R/conformance coverage for BAM and CRAM fast/fragment/default/median cases.
Expand rduckhts_mosdepth() and bundled duckhts_mosdepth(...) fast-mode parity with quantize = "...", writing mosdepth-style .quantized.bed.gz + CSI output, and add bundled tests for quantized output plus explicit by = "<bed>" validation.
Expand rduckhts_mosdepth() and bundled duckhts_mosdepth(...) fast-mode parity with thresholds = "..." for by = "<window|bed>", writing mosdepth-style .thresholds.bed.gz + CSI outputs; also align window/BED mean accumulation and window-region distribution bucketing with upstream mosdepth's current implementation behavior, and add bundled SQL/R/native-conformance coverage for the new outputs.
Bundle upstream mosdepth edge-case fixtures (big, empty-tids, overlapping-pairs, ovl, nanopore, and related BED files) in inst/extdata/ for stronger mosdepth parity testing, and record Brent Pedersen as the original mosdepth author in the package metadata/copyright bundle.
Expand rduckhts_mosdepth() and bundled duckhts_mosdepth(...): the native mosdepth-compatible fast-mode rewrite now accepts indexed CRAM input via fasta = ... when required by htslib, and exposes precision_digits = 2 as an explicit wrapper argument instead of relying on the MOSDEPTH_PRECISION environment variable; add bundled BAM/CRAM tests plus explicit precision validation.
Expand README.Rmd with runnable compression/indexing examples covering rduckhts_bgzip(), rduckhts_bgunzip(), rduckhts_bam_index(), rduckhts_bcf_index(), and rduckhts_tabix_index(), then regenerate the rendered package README outputs.
Add decompression_threads to rduckhts_bam() and rduckhts_bam_multi(), matching the bundled read_bam(..., decompression_threads := 2) SQL parameter. The previous hardcoded htslib worker-thread count is now the documented default, and 0 disables per-file worker threads.
Speed up bundled zero-column COUNT(*) queries across the HTS readers: read_bam(...), read_bcf(...), read_tabix(...), read_gff(...), read_gtf(...), and indexed read_bed(...) now use index metadata for full-file count-only scans when DuckDB projects no output columns; read_fasta(...) uses faidx sequence counts when an index is available and otherwise counts FASTA headers directly; read_fastq(...) continues to count raw FASTQ records directly when no projected columns are needed, while preserving paired/interleaved validation errors.
Add multi-file reading wrappers: rduckhts_bam_multi, rduckhts_bcf_multi, rduckhts_fastq_multi, rduckhts_fasta_multi, rduckhts_bed_multi, rduckhts_tabix_multi, rduckhts_gff_multi, rduckhts_gtf_multi. Each follows the standard (con, table_name, files, ..., overwrite) convention, creates a DuckDB table with a filename column, and accepts an optional .params data.frame for per-file parameter overrides (e.g. per-sample regions or index paths). File expansion uses DuckDB's glob() so S3 URLs work transparently.
Add bundled hts_union_query(reader, pattern, params) SQL scalar macro for pure-SQL multi-file reading via SELECT * FROM query(hts_union_query('read_bam', '*.bam')).
Clarify the package README's browser/webR documentation: README.Rmd now covers the full Module.duckhtsWasmHttpConfig parameter set (headers, allowHosts, enforceHostAllowlist, withCredentials, allowInsecureAuth), explicitly notes that webR consumers can set that config from R via webr::eval_js() without editing the host page, and covers practical wasm/browser behaviors such as same-origin setup, CORS requirements, .csi to .tbi fallback, and non-fatal Range warnings under the local http.server harness.
Use one extension-owned Emscripten compatibility header in the package wasm/webR build: configure now includes the shared header from src/include/ via the bootstrapped inst/duckhts_extension/include/wasm_socket_compat.h copy, keeping the bundled browser build aligned with the extension sources without changing native package builds.
Make the bundled wasm extension self-contained with respect to htslib: the Emscripten/webR configure path now builds only libhts.a, links duckhts.duckdb_extension directly against that static archive, and no longer relies on runtime loading of bundled libhts.so* files in webR/browser environments.
Add a browser-native wasm http / https backend in the bundled extension: src/wasm_http_hfile.c now registers a synchronous XHR-backed htslib scheme handler from the DuckDB extension entry point, so browser wasm builds can read same-origin and CORS-enabled remote HTS URLs without going through libcurl sockets.
Keep wasm libcurl disabled in configure: r-wasm/webr ships /opt/webr/wasm/lib/libcurl.a and the emcc link test against it passes, but libcurl's connect() calls from a SIDE_MODULE still trigger a webR Emscripten message-bus error (resolved is not a function) on first network use, so the package-owned XHR backend is the supported wasm HTTP path.
Harden wasm browser HTTP range behavior in the bundled extension: wasm_http_hfile.c now caches object sizes from Content-Range/Content-Length, clamps range requests when size is known, short-circuits reads at/after EOF, and uses a GET Range: bytes=0-0 fallback for SEEK_END size discovery when HEAD metadata is unavailable; this avoids cross-origin 416 failures on .tbi index EOF probes (including GTEx tabix in webR/browser).
Harden non-Range wasm/browser HTTP fallback in the bundled extension: when ranged reads receive 200 OK, wasm_http_hfile.c now caches the full object per open handle and serves later reads from that in-memory cache to avoid repeated full downloads, while still emitting one-time warnings when Range is ignored and when large fallback payloads (>=64 MiB) are used.
Add optional wasm/browser request-header configuration in the bundled extension via Module.duckhtsWasmHttpConfig: supports custom headers (including bearer auth), host allowlisting, optional withCredentials, and a default HTTPS-only guard that blocks Authorization on non-HTTPS URLs unless allowInsecureAuth is explicitly enabled.
Extend Module.duckhtsWasmHttpConfig with enforceHostAllowlist in the bundled wasm backend: when enabled, requests to hosts outside allowHosts are blocked instead of merely omitting configured headers.
Fix the bundled wasm side-module final link during configure: preserve webR/Emscripten ${LDFLAGS} on the final duckhts.duckdb_extension link so the SIDE_MODULE settings reach the extension itself, and export duckhts_init_c_api explicitly for DuckDB's loader. This fixes webR/browser rduckhts_load() failures where DuckDB could not find a usable init export in duckhts.duckdb_extension.
Set the bundled extension metadata platform to linux_i686_musl for the Emscripten/webR path in configure, matching the platform value you are using for browser-side loading tests.
Fix Wasm package builds under rwasm / r-universe: the package configure script now preserves injected NAME=VALUE cache overrides, forwards explicit --build / --host triplets into the vendored htslib ./configure, forwards webR's Emscripten port flags for zlib/bzip2, seeds wasm-safe Autoconf cache results for zlib/bzip2/socket probes, injects a tiny Emscripten-only socket compatibility shim for recv/send/closesocket, and disables the optional htslib features that are not available in the stock webR/r-universe wasm toolchain (libcurl, S3, GCS, lzma, plugins); this fixes the original ac_cv_func_getrandom=no: command not found failure and the subsequent nested htslib cross-compile probe failures without changing native configure behavior.
Fix bundled wasm extension artifacts: the package/browser wasm build now includes vendored htslib in the linked archive, avoiding unresolved symbols such as bcf_readrec at LOAD.

Rduckhts 1.1.6-0.0.2 (2026-04-09)

Fix test_bam_file_offset: cast COUNT(*) results to INTEGER in SQL so the DuckDB driver returns R integer rather than numeric (BIGINT maps to double in the duckdb R driver), restoring expect_identical assertions.

Rduckhts 1.1.6-0.0.1

Fix bundled read_hts_index_spans(...) / rduckhts_hts_index_spans(): the span view now returns real chunk rows from CSI/TBI/BAI indexes, including populated bin, chunk_beg_vo, chunk_end_vo, chunk_bytes, seq_start, and seq_end values instead of placeholder NAs; BCF-backed calls also avoid the previous noisy tbx probe warning on .csi indexes.
Add FILE_OFFSET column to rduckhts_bam() / read_bam(...): exposes the BGZF virtual file offset after each record. Zero runtime overhead (macro over already-open struct fields). Enables ORDER BY FILE_OFFSET in SQL LAG() / LAST_VALUE() window functions to reproduce exact BAM file order for streaming deduplication algorithms. Together with the // integer-division operator and LAST_VALUE(... IGNORE NULLS), this permits exact replication of WisecondorX's larp/larp2 state machine in pure SQL, confirmed at 0 mismatches across 25,115 non-zero bins on a real NIPT BAM.

Rduckhts 1.1.5-0.0.1

Fix bundled bcftools_liftover(...) / rduckhts_liftover() cache and realignment hardening: per-thread chain/FASTA contexts are now bounded instead of accumulating for the lifetime of worker threads, and scalar left-alignment no longer reuses stale traceback state after failed/empty alignments.
Fix bundled read_bam(...) / rduckhts_bam() and read_bcf(...) / rduckhts_bcf() indexed parallel full scans when headers contain leading empty contigs: contig claiming now retries iteratively instead of recursively, and the BAM reader no longer returns an empty chunk after successfully handing off to the next contig.
Fix bundled Windows builds under MinGW and Rtools: vendored htslib configuration now distinguishes windows_amd64_mingw from windows_amd64_rtools, keeping the smaller configure.win-style library set on MinGW while restoring the fuller static libcurl dependency closure needed on Rtools. CURL_STATICLIB remains on built objects rather than ./configure probes.
Fix bundled Windows windows_amd64_rtools builds: the package build now pins CC/AR/RANLIB from R CMD config, avoiding mixed compiler/library selection when vendored htslib is configured, and keeps the MinGW static-libcurl configuration aligned with Rtools libcurl.a.
Fix bundled read_bcf(...) / rduckhts_bcf() mapping of fixed-count INFO/FORMAT arrays: exact-cardinality fields such as Number=2 and Number=4 now materialize as DuckDB array/list columns instead of silently dropping all but the first value.
Fix bundled read_bcf(...) / rduckhts_bcf() handling of string FORMAT lists such as DRAGEN FORMAT/LAA: Number != 1 string FORMAT fields now materialize as VARCHAR[] instead of triggering DuckDB internal assertion failures.
Fix bundled duckdb_munge(...) / rduckhts_munge() multithreaded FASTA lookups: FASTA index handles are now thread-local and FASTA fetches are synchronized in munge, avoiding intermittent fai_retrieve failures and aborts when fasta_ref is used with PRAGMA threads > 1.
Add rduckhts_score(): polygenic risk score computation backed by the bcftools +score plugin, supporting GT/DS/HDS/AP/GP/AS dosage modes, all major GWAS summary presets (PLINK, PLINK2, REGENIE, SAIGE, BOLT, METAL, PGS, SSF/GWAS-SSF), GWAS-VCF multi-PRS scoring, p-value thresholding, sample subsetting, and region/filter controls.
Add rduckhts_munge(): GWAS summary statistics normalization backed by bcftools +munge, with FASTA reference allele resolution, swap-aware effect/frequency transforms, and METAL meta-analysis column support.
Add rduckhts_liftover(): variant coordinate liftover backed by bcftools +liftover using UCSC chain files, with full indel normalization, INFO/END lifting, and MT passthrough.
Add rduckhts_bed() for BED3–BED12 interval files and rduckhts_fasta_nuc() for nucleotide composition over BED intervals or fixed-width bins.
Add compression and index helpers: rduckhts_bgzip(), rduckhts_bgunzip(), rduckhts_bam_index(), rduckhts_bcf_index(), and rduckhts_tabix_index().
Add HTS metadata readers: rduckhts_hts_header(), rduckhts_hts_index(), rduckhts_hts_index_spans(), and rduckhts_hts_index_raw().
Add quality encoding controls to rduckhts_bam() and rduckhts_fastq() (quality_representation, input_quality_encoding) and rduckhts_detect_quality_encoding() for heuristic FASTQ encoding detection.
Add sequence_encoding := 'nt16' parameter to rduckhts_bam(), rduckhts_fasta(), and rduckhts_fastq() for raw htslib nt16 sequence output as UTINYINT[].
Add SAM flag helpers sam_flag_bits() and sam_flag_has(), CIGAR utility functions, and is_forward_aligned().
Bundle duckhts 1.1.5 extension.

Rduckhts 0.1.3-0.0.2.9000

Rduckhts 0.1.3-0.0.2 (2026-02-24)

Conditionaly enable plugins in windows
Updates the configure script to avoid check faillure on CRAN MacOS
Update the extension version to 0.1.3

Rduckhts 0.1.2-0.1.5

Fixed inadvertant removal of libexec
Updated the plugin to add header table functions

Rduckhts 0.1.2-0.1.4 (2026-02-23)

CRAN Submission

Rduckhts 0.1.2-0.0.9000

Different fixes for CRAN submission
- Updated DESCRIPTION Title/Description formatting and added HTSlib reference.
- Removed default write paths in bootstrap/build helpers; now require explicit paths.
- setup_hts_env now accepts an explicit plugins_dir parameter.
- duckhts_build now accepts a make argument (GNU make required).
modified configure to attemp to support wasm
Update bootstrapped extension code to match duckhts 0.1.2.
Add SAMtags + auxiliary tag support (standard_tags, auxiliary_tags).
Add tabix header/typing options (header, header_names, auto_detect, column_types).

Rduckhts 0.1.1-0.0.3

make the build single threaded

misspeling correction

Rduckhts 0.1.1-0.0.2

CRAN resubmission: apply DuckDB C API header patch to avoid strict-prototypes warnings.

Rduckhts 0.1.1-0.0.1

CRAN Submission
Bump bundled duckhts extension version to 0.1.1.
Initial development release.
Bundles the DuckHTS DuckDB extension and htslib for HTS file readers.
Adds table-creation helpers for VCF/BCF, BAM/CRAM, FASTA/FASTQ, GFF/GTF, and tabix.