rduckhts_load() now expose DuckDB runtime type-support probes for VARIANT and GEOMETRYrduckhts_bcftools_norm() / duckhts_bcftools_norm(...) site-preserving table-macro query shape by removing the extra correlated scalar LATERAL subquery around bcftools_norm_row(...), eliminating the site-preserving LEFT_DELIM_JOIN plan overhead while preserving split-mode ALT row semantics and caller columns whose names collide with DuckHTS helper-column names used internally by earlier macro forms; add tinytest coverage for DuckDB's suffixed behavior when callers already have normalized-output column namesscan_mode = "auto"|"sequential" controls through the R wrappers and multi-file helpers for read_bcf, read_bam, read_fasta, read_fastq, read_bed, read_gff, read_gtf, and read_tabix, so callers can force full-file streaming/counting instead of index-backed count or parallel scan paths where applicable; sequential mode is rejected for region queriesbcftools_norm_row(...) / rduckhts_bcftools_norm() for already-normalized plain ACGTN allele rows by skipping kstring left-realignment setup when trim predicates prove the row is unchanged; avoid per-row FASTA path duplication after the vector-local cache is established, reuse larger bounded per-thread reference windows, and document/defensively serialize htslib FASTA fetches while keeping normalization reference caches thread-local to avoid the faidx cache race class fixed in https://github.com/RGenomicsETL/duckhts/issues/17 / https://github.com/RGenomicsETL/duckhts/pull/18rduckhts_bcftools_norm() / duckhts_bcftools_norm(...) gVCF-aware for vt/vcfnorm-style row normalization: <NON_REF> and <*> reference-block alleles now pass through with GVCFReferenceBlock, and mixed real-plus-gVCF-symbolic alleles normalize the real alleles while preserving symbolic alleles and caller-supplied reference-block END in site-preserving output; mixed * plus real alleles now follows the same ignored-symbolic path, while *-only rows remain SpanningDeletion; bundled phased GT/PL/GP/DS/PS FORMAT fixtures, including haploid/triploid/tetraploid Number=G cardinality cases, and tinytests pin phase-separator preservation through read_bcf(...)rduckhts_bcf_convert_parquet(), rduckhts_bam_convert_parquet(), rduckhts_gff_convert_parquet(), and rduckhts_tabix_convert_parquet() around the bundled extension SQL builders duckhts_*_convert_parquet_sql(...); these convert DuckHTS scans to Parquet with DuckHTS write-format metadata, preserved raw headers, optional corrected header text, SQL-filter provenance, selected-column/partition metadata, arbitrary user metadata via R named lists/extension metadata := map(...), optional caller-managed JSON-file metadata when DuckDB's json extension is available, and partitioned-output support for DuckLake-style registration of premade Parquet files#CHROM/sample header line in bundled read_hts_header(..., mode := 'raw'), so Parquet metadata written from VCF/BCF inputs has the complete header needed for future VCF/BCF regenerationrduckhts_simd_kernel_info(), keeping R wrappers thin while reporting one row per logical kernel and preserving backend-agnostic SQL/R conformance tests for seq_gc_content(...)selectable versus available diagnostics in generated docs, and preserve ASCII SQL quotes for the rendered duckhts_simd_set_backend('auto'|'scalar'|backend) catalog callHAVE_* macro guards from all bundled SIMD backend translation units; compile-time gate is now defined(__x86_64__) && (defined(__GNUC__) || defined(__clang__)) for x86 backends, available without autoconf; runtime dispatch and scalar fallback behavior are unchanged; add scalar-vs-auto backend R correctness tests for rduckhts_simd_set_backend() / seq_gc_content(...) covering GC=0/0.5/1.0, embedded-N calling, and soft-masked lowercase bases.validate_simd_backend() R helper from SIMD wrapper functions; backend-name normalization and validation now belong to the extension, while the R wrappers only enforce that backend is a single non-missing character stringrduckhts_simd_backend(), rduckhts_simd_requested_backend(), rduckhts_simd_backend_available(), and rduckhts_simd_set_backend(), route bundled seq_gc_content(...) through the new eager scalar/optional-AVX2 runtime dispatch scaffold while preserving scalar fallback behavior on ARM, wasm, and scalar-only builds, add runtime-gated AVX-512, ARM NEON, and wasm SIMD128 backend translation units where compiler-supported, keep the manual duckhts_build() rebuild path wired to the SIMD sources, and add README examples for the scalar/auto SIMD flowrduckhts_liftover() / bcftools_liftover(...) FASTA contig alias handling during source/destination reference validation and sequence fetches, and align bundled spanning-deletion * allele handling with upstream bcftools +liftover: inputs such as 23, 24, 26, X, Y, MT, and chr* aliases now resolve through the same canonical path, avoiding spurious SourceRefMismatch rejects for X/Y/MT indels when the bundled chain names and FASTA names differ only by canonical aliasing; bundled *-allele rows now follow upstream swap/ref-add semantics instead of taking the symbolic short-circuit path, full-file GIAB conformance against installed bcftools +liftover is now exact, and bundled SQL/tinytest coverage now pins the 23 -> chrX, SWAP=2, and SWAP=-1 regressionsrduckhts_load() sessions for both the bcftools-style and raw upstream numeric surfaces: variantkey(...) now matches bcftools %VKX / +add-variantkey on 1-based VCF rows, large/ambiguous/symbolic alleles keep the official hashed nonreversible mode, regionkey(...) adds 0-based half-open span keys plus overlap helpers, bundled tinytests pin reversible and hashed cases, and the package README now includes concrete DBI examples for VariantKey / RegionKey usagerduckhts_bcftools_norm(..., split_multiallelic = TRUE) row preservation for ref-only and empty-ALT inputs: rows with ALT='.', NULL ALT values, empty ALT lists, or NULL ALT list elements no longer disappear from split-mode DBI results, bundled tinytest coverage now pins the expected RefOnly / NullInput statuses and alt_index behavior, bundled rduckhts_bcf() / rduckhts_bcf_multi() now expose decompression_threads = 0 for explicit htslib worker-thread control on bgzipped VCF/BCF reads, and the package README now includes a concrete normalization examplerduckhts_fasta_index() now returns the generated .fai path when index_path = NULL instead of an empty string, bundled regression coverage now also pins default-path returns for BGZF compression/decompression and BAM/BCF/tabix index builders, and the rduckhts_bgzip() / rduckhts_bgunzip() wrappers now correctly propagate keep = FALSE instead of silently falling back to the extension default keep := TRUErduckhts_bcftools_norm() and bundle bcftools_norm_row(...) / duckhts_bcftools_norm(...) for bcftools/vt-style FASTA-backed variant normalization from DBI queries: ALT inputs may be either comma-delimited VARCHAR or VARCHAR[], the bundled result appends pos_normed, end_pos_normed, ref_normed, alt_normed, normed, and norm_status, split mode emits one row per ALT with alt_index, and bundled SQL/tinytest coverage now exercises sequence, multiallelic, symbolic <DEL>/<DUP>, and missing-contig rowsrduckhts_liftover() / bcftools_liftover(...) indel parity in two exact upstream rewrite points: repeat-run source extension now keeps extending across the cached source-reference window boundary when needed, and the bundled clip-pad Needleman-Wunsch path now keeps the best shift even when candidate alignment scores are negative instead of leaving padded intervals unshifted; bundled SQL/tinytest coverage now includes dedicated repeat-run and clip-pad regression fixtures, and the real-data conformance workflow reaches exact parity with installed bcftools +liftover on GIAB HG001 chr20 plus the full HG006 GRCh37 benchmark VCFrduckhts_liftover() / bcftools_liftover(...) row rejection for invalid source-reference indel and difficult-SNP inputs: rows that fail the source-FASTA validation path now stay in the result with mapped = FALSE and reject_reason = 'SourceRefMismatch' instead of fabricating padded lifted alleles or aborting the query; bundled tests and README examples now reflect the reject-row behaviorrduckhts_pileup() and bundle native read_pileup(...) for region-scoped BAM pileups with per-position chrom, pos, depth, bases, and quals; expose bundled read_bam(..., cigar_representation := 'binary') through rduckhts_bam(..., cigar_representation = "binary") and multi-file BAM wrappers, returning packed BAM CIGAR ops as UINTEGER[]; and expose explicit gzi_path arguments in rduckhts_fasta(), rduckhts_fasta_multi(), and rduckhts_fasta_nuc() so packaged bgzipped FASTA workflows can use relocated .gzi sidecarsrduckhts_fasta_nuc() / fasta_nuc(...) nucleotide counting on capable x86_64 hosts with an AVX2+popcnt fast path selected via htslib-style runtime dispatch, while preserving the scalar fallback everywhere elserduckhts_bam_index(): native remote BAM/BCF/tabix/FASTA/BED reads now apply htslib block/cache tuning by access pattern, while wasm/browser builds use the same policy with smaller budgets appropriate for the XHR-backed worker runtime; the bundled vendored htslib also now exposes a pre-opened sam_index_build4(...) entry point so bam_index(...) can be tuned before remote index construction beginsrduckhts_bcf() / read_bcf(...) scanning stability for records where FILTER lists were emitted without reserving list-vector capacity, which could crash with allocator corruption (double free/invalid pointer) during full-table reads; FILTER entries now reserve child-list space before writes and scans are stable on files previously triggering crashes-Wpedantic during Unix and Windows package builds while leaving vendored htslib on its upstream warning flagswasm_http_hfile.c translation unit so native package builds do not warn about an empty source file under pedantic C diagnosticsconfigure.win libcurl detection: the package now requires a successful curl_easy_init link using the detected pkg-config libcurl dependency closure before enabling htslib remote URL support, and otherwise disables libcurl/S3/GCS cleanlyrduckhts_gff() / rduckhts_gtf() and multi-file wrappers: attributes_list = TRUE returns MAP(VARCHAR, VARCHAR[]) with grouped multi-values and GFF3 percent-decoding, while attributes_pairs = TRUE returns LIST<STRUCT(key VARCHAR, value VARCHAR, idx INTEGER)> for exact key/value/index records; attributes_map = TRUE remains the backward-compatible raw scalar mapread_gff(..., strict := true) through rduckhts_gff(strict = TRUE) and rduckhts_gff_multi(strict = TRUE), enabling GFF3 structural validation from R/DBI workflows, including wrong field counts and malformed attribute segments, while keeping the default GFF reader permissive for existing ingestion pipelinesrduckhts_score() / bcftools_score(...) so summary_path can be a character vector or callers can use summaries_list_file; multiple TSV/SSF summaries are scored in one genotype scan, log_path can write per-PRS matching/audit counts for loaded, matched, allele-mismatch, and duplicate markers, summaries_list_file directory scans are deterministic and ignore index sidecars, generated score/count column names are validated for uniqueness, and score accumulation now follows upstream bcftools +score float32 summation more closelyhtslib 1.23.1/system-requirements wording and redact transient temp-file paths in rendered example output so regenerated README diffs stay deterministicduckhts_cgranges_overlaps_list(...), a vectorized scalar overlap expander that returns LIST-of-STRUCT hit records so DBI queries can expand provider rows with UNNEST(...) without generated bulk-probe SQL; package tests cover one-row-per-hit expansion over regular tables and bundled BED data, and the existing duckhts_cgranges_overlaps_bulk(...) probe path now also handles DuckDB string vector lengths safelyduckhts_cgranges_from_query(...) ingestion of DuckDB string vectors by respecting string lengths instead of assuming NUL-terminated buffers; this fixes cgranges construction from providers such as read_bed(...) with longer chromosome names and adds package regression coverageduckhts_cgranges_has_overlap(...) and duckhts_cgranges_count_overlaps(...), enabling DBI queries to stream provider rows through an already-finalized session cgranges index for filtering/count annotations without the materializing overlaps_bulk query-string path; add package-level coverage for overlap, contain, and NULL probe semanticsduckhts_cgranges_overlaps_bulk(...) for SQL-first bulk cgranges probing from R/DBI sessions: one table-function call now streams a query of probe intervals through a finalized cgranges index, supports mode = 'overlap'|'contain', accepts an optional query_row_id_col, and otherwise emits 1-based probe ordinals as query_row_id; add package-level regression coverage for the new bulk pathduckhts_cgranges_* entry points in the generated function catalog and package README, add bundled DBI smoke coverage for the session-scoped cgranges registry API, and include a packaged overlap-conformance script reference for bedtk-style parity checksrduckhts_fasta_nuc() / fasta_nuc(...) GC and AT percentages for intervals containing N: pct_gc and pct_at now use only informative A/C/G/T bases in the denominator, so ambiguous bases no longer depress reported bin/interval composition percentages; add bundled regression coverageduckhts_cgranges_from_query(...), which runs the source query on an extension-owned DuckDB connection and builds the cgranges index in C before publishing it to the session registry; duckhts_cgranges_from_table(...) remains deferred for nowRduckhtsrduckhts_bam_bed_coverage(), bundling native duckhts_bam_bed_coverage(...) for samtools coverage-like regional summaries over BED targets with DuckHTS-specific pre/post-filter columns and read-mode strand-specific post summaries; bundled SQL/tinytest coverage now checks expected outputs on the packaged mixed BAM fixture, and fragment_mode / processing_threads are exposed but currently reserved for later phasesrduckhts_bam_bed_coverage() / duckhts_bam_bed_coverage(...) peak memory by allocating and freeing per-region working depth buffers during scan processing instead of retaining them for the whole BED, tile large target intervals internally when computing covered-base breadth, keep the tiled implementation single-pass, align min_depth > 1 mean-depth behavior with samtools coverage, and expose decompression_threads so package callers can set htslib BAM/CRAM decode worker counts explicitlyrduckhts_samtools_idxstats(), bundling native duckhts_samtools_idxstats(...) for samtools idxstats-compatible BAM/CRAM/SAM summaries with indexed BAM fast-paths and scan fallback; package SQL/tinytest coverage now checks BAM fast-path output, CRAM fallback output, explicit index_path, and overwrite errorsREADME.html, .Rcheck, staged duckhts_extension/htslib build outputs, wasm/webR harness byproducts, and stray root-level index files under r/Rduckhts/; add top-level make clean_local to purge the reproducible package-side artifactsprocessing_threads parameter to rduckhts_mosdepth() and bundled duckhts_mosdepth(...) for parallel contig processing: workers claim contigs atomically and write output in header order; on the NA12878 WGS benchmark with 2 processing threads, fast mode is 1.38x faster, default mode 1.40x faster, and fragment mode 1.61x faster than mosdepth v0.3.13, all byte-identical; new default is processing_threads = 2rduckhts_mosdepth() defaults to threads = 2 (decompression) and processing_threads = 2 (parallel contigs) for better out-of-the-box WGS performanceduckhts_extension/htslib/{include,lib}/; add inst/htslib_config.R (generated from htslib_config.R.in at configure time) providing htslib_cflags(), htslib_libs(), htslib_rpath(), and htslib_version() for downstream R packages that link against the bundled htslibconfigure.win to stage htslib headers into include/htslib/ alongside lib/, matching Unix configurebam_bin_counts(...) / rduckhts_bam_bin_counts() to return a dense fixed-bin layout across each selected contig span, including zero-count bins up to the contig end instead of only observed bins; this gives downstream CNV/sample serializers stable per-contig bin shapes, and the package docs/tests now describe and validate the dense contractrduckhts_bam_bin_counts() and bundle native bam_bin_counts(...) fixed-width BAM/CRAM binning in the package. The new wrapper exposes mapq, require_flags, exclude_flags, and rmdup = "none"|"flag"|"streaming" duplicate handling, always returns per-bin forward/reverse totals, and can add per-bin GC/MAPQ summaries via stats = "gc", "mq", or "gc,mq"; bundled extdata now includes the tiny WisecondorX BAM/CRAM fixtures used by the new SQL/R tests, and the package README now includes a native bin-count examplerduckhts_mosdepth() examples to the package README, including windowed fragment coverage output and preview of the generated BED.gz regions file, and refresh the generated function-catalog text so the packaged mosdepth description matches the current v0.3.13 parity surfacerduckhts_mosdepth() and bundled duckhts_mosdepth(...) to cover the pinned local mosdepth 0.3.13 option surface for indexed BAM/CRAM input: fragment_mode = TRUE now matches upstream --fragment-mode full-fragment insert coverage for proper pairs, default mode is supported with CIGAR-aware coverage plus mate-overlap correction, read_groups = "..." filters RG tags, min_frag_len / max_frag_len filter absolute template length, and use_median = TRUE switches by = "<window|bed>" outputs from mean to median; add bundled SQL/R/conformance coverage for BAM and CRAM fast/fragment/default/median cases.rduckhts_mosdepth() and bundled duckhts_mosdepth(...) fast-mode parity with quantize = "...", writing mosdepth-style .quantized.bed.gz + CSI output, and add bundled tests for quantized output plus explicit by = "<bed>" validation.rduckhts_mosdepth() and bundled duckhts_mosdepth(...) fast-mode parity with thresholds = "..." for by = "<window|bed>", writing mosdepth-style .thresholds.bed.gz + CSI outputs; also align window/BED mean accumulation and window-region distribution bucketing with upstream mosdepth's current implementation behavior, and add bundled SQL/R/native-conformance coverage for the new outputs.big, empty-tids, overlapping-pairs, ovl, nanopore, and related BED files) in inst/extdata/ for stronger mosdepth parity testing, and record Brent Pedersen as the original mosdepth author in the package metadata/copyright bundle.rduckhts_mosdepth() and bundled duckhts_mosdepth(...): the native mosdepth-compatible fast-mode rewrite now accepts indexed CRAM input via fasta = ... when required by htslib, and exposes precision_digits = 2 as an explicit wrapper argument instead of relying on the MOSDEPTH_PRECISION environment variable; add bundled BAM/CRAM tests plus explicit precision validation.README.Rmd with runnable compression/indexing examples covering rduckhts_bgzip(), rduckhts_bgunzip(), rduckhts_bam_index(), rduckhts_bcf_index(), and rduckhts_tabix_index(), then regenerate the rendered package README outputs.decompression_threads to rduckhts_bam() and rduckhts_bam_multi(), matching the bundled read_bam(..., decompression_threads := 2) SQL parameter. The previous hardcoded htslib worker-thread count is now the documented default, and 0 disables per-file worker threads.COUNT(*) queries across the HTS readers: read_bam(...), read_bcf(...), read_tabix(...), read_gff(...), read_gtf(...), and indexed read_bed(...) now use index metadata for full-file count-only scans when DuckDB projects no output columns; read_fasta(...) uses faidx sequence counts when an index is available and otherwise counts FASTA headers directly; read_fastq(...) continues to count raw FASTQ records directly when no projected columns are needed, while preserving paired/interleaved validation errors.rduckhts_bam_multi, rduckhts_bcf_multi, rduckhts_fastq_multi, rduckhts_fasta_multi, rduckhts_bed_multi, rduckhts_tabix_multi, rduckhts_gff_multi, rduckhts_gtf_multi. Each follows the standard (con, table_name, files, ..., overwrite) convention, creates a DuckDB table with a filename column, and accepts an optional .params data.frame for per-file parameter overrides (e.g. per-sample regions or index paths). File expansion uses DuckDB's glob() so S3 URLs work transparently.hts_union_query(reader, pattern, params) SQL scalar macro for pure-SQL multi-file reading via SELECT * FROM query(hts_union_query('read_bam', '*.bam')).README.Rmd now covers the full Module.duckhtsWasmHttpConfig parameter set (headers, allowHosts, enforceHostAllowlist, withCredentials, allowInsecureAuth), explicitly notes that webR consumers can set that config from R via webr::eval_js() without editing the host page, and covers practical wasm/browser behaviors such as same-origin setup, CORS requirements, .csi to .tbi fallback, and non-fatal Range warnings under the local http.server harness.configure now includes the shared header from src/include/ via the bootstrapped inst/duckhts_extension/include/wasm_socket_compat.h copy, keeping the bundled browser build aligned with the extension sources without changing native package builds.htslib: the Emscripten/webR configure path now builds only libhts.a, links duckhts.duckdb_extension directly against that static archive, and no longer relies on runtime loading of bundled libhts.so* files in webR/browser environments.http / https backend in the bundled extension: src/wasm_http_hfile.c now registers a synchronous XHR-backed htslib scheme handler from the DuckDB extension entry point, so browser wasm builds can read same-origin and CORS-enabled remote HTS URLs without going through libcurl sockets.libcurl disabled in configure: r-wasm/webr ships /opt/webr/wasm/lib/libcurl.a and the emcc link test against it passes, but libcurl's connect() calls from a SIDE_MODULE still trigger a webR Emscripten message-bus error (resolved is not a function) on first network use, so the package-owned XHR backend is the supported wasm HTTP path.wasm_http_hfile.c now caches object sizes from Content-Range/Content-Length, clamps range requests when size is known, short-circuits reads at/after EOF, and uses a GET Range: bytes=0-0 fallback for SEEK_END size discovery when HEAD metadata is unavailable; this avoids cross-origin 416 failures on .tbi index EOF probes (including GTEx tabix in webR/browser).200 OK, wasm_http_hfile.c now caches the full object per open handle and serves later reads from that in-memory cache to avoid repeated full downloads, while still emitting one-time warnings when Range is ignored and when large fallback payloads (>=64 MiB) are used.Module.duckhtsWasmHttpConfig: supports custom headers (including bearer auth), host allowlisting, optional withCredentials, and a default HTTPS-only guard that blocks Authorization on non-HTTPS URLs unless allowInsecureAuth is explicitly enabled.Module.duckhtsWasmHttpConfig with enforceHostAllowlist in the bundled wasm backend: when enabled, requests to hosts outside allowHosts are blocked instead of merely omitting configured headers.configure: preserve webR/Emscripten ${LDFLAGS} on the final duckhts.duckdb_extension link so the SIDE_MODULE settings reach the extension itself, and export duckhts_init_c_api explicitly for DuckDB's loader. This fixes webR/browser rduckhts_load() failures where DuckDB could not find a usable init export in duckhts.duckdb_extension.linux_i686_musl for the Emscripten/webR path in configure, matching the platform value you are using for browser-side loading tests.rwasm / r-universe: the package configure script now preserves injected NAME=VALUE cache overrides, forwards explicit --build / --host triplets into the vendored htslib ./configure, forwards webR's Emscripten port flags for zlib/bzip2, seeds wasm-safe Autoconf cache results for zlib/bzip2/socket probes, injects a tiny Emscripten-only socket compatibility shim for recv/send/closesocket, and disables the optional htslib features that are not available in the stock webR/r-universe wasm toolchain (libcurl, S3, GCS, lzma, plugins); this fixes the original ac_cv_func_getrandom=no: command not found failure and the subsequent nested htslib cross-compile probe failures without changing native configure behavior.htslib in the linked archive, avoiding unresolved symbols such as bcf_readrec at LOAD.test_bam_file_offset: cast COUNT(*) results to INTEGER in SQL so the DuckDB driver returns R integer rather than numeric (BIGINT maps to double in the duckdb R driver), restoring expect_identical assertions.read_hts_index_spans(...) / rduckhts_hts_index_spans(): the span view now returns real chunk rows from CSI/TBI/BAI indexes, including populated bin, chunk_beg_vo, chunk_end_vo, chunk_bytes, seq_start, and seq_end values instead of placeholder NAs; BCF-backed calls also avoid the previous noisy tbx probe warning on .csi indexes.FILE_OFFSET column to rduckhts_bam() / read_bam(...): exposes the BGZF virtual file offset after each record. Zero runtime overhead (macro over already-open struct fields). Enables ORDER BY FILE_OFFSET in SQL LAG() / LAST_VALUE() window functions to reproduce exact BAM file order for streaming deduplication algorithms. Together with the // integer-division operator and LAST_VALUE(... IGNORE NULLS), this permits exact replication of WisecondorX's larp/larp2 state machine in pure SQL, confirmed at 0 mismatches across 25,115 non-zero bins on a real NIPT BAM.bcftools_liftover(...) / rduckhts_liftover() cache and realignment hardening: per-thread chain/FASTA contexts are now bounded instead of accumulating for the lifetime of worker threads, and scalar left-alignment no longer reuses stale traceback state after failed/empty alignments.read_bam(...) / rduckhts_bam() and read_bcf(...) / rduckhts_bcf() indexed parallel full scans when headers contain leading empty contigs: contig claiming now retries iteratively instead of recursively, and the BAM reader no longer returns an empty chunk after successfully handing off to the next contig.htslib configuration now distinguishes windows_amd64_mingw from windows_amd64_rtools, keeping the smaller configure.win-style library set on MinGW while restoring the fuller static libcurl dependency closure needed on Rtools. CURL_STATICLIB remains on built objects rather than ./configure probes.windows_amd64_rtools builds: the package build now pins CC/AR/RANLIB from R CMD config, avoiding mixed compiler/library selection when vendored htslib is configured, and keeps the MinGW static-libcurl configuration aligned with Rtools libcurl.a.read_bcf(...) / rduckhts_bcf() mapping of fixed-count INFO/FORMAT arrays: exact-cardinality fields such as Number=2 and Number=4 now materialize as DuckDB array/list columns instead of silently dropping all but the first value.read_bcf(...) / rduckhts_bcf() handling of string FORMAT lists such as DRAGEN FORMAT/LAA: Number != 1 string FORMAT fields now materialize as VARCHAR[] instead of triggering DuckDB internal assertion failures.duckdb_munge(...) / rduckhts_munge() multithreaded FASTA lookups: FASTA index handles are now thread-local and FASTA fetches are synchronized in munge, avoiding intermittent fai_retrieve failures and aborts when fasta_ref is used with PRAGMA threads > 1.rduckhts_score(): polygenic risk score computation backed by the bcftools +score plugin, supporting GT/DS/HDS/AP/GP/AS dosage modes, all major GWAS summary presets (PLINK, PLINK2, REGENIE, SAIGE, BOLT, METAL, PGS, SSF/GWAS-SSF), GWAS-VCF multi-PRS scoring, p-value thresholding, sample subsetting, and region/filter controls.rduckhts_munge(): GWAS summary statistics normalization backed by bcftools +munge, with FASTA reference allele resolution, swap-aware effect/frequency transforms, and METAL meta-analysis column support.rduckhts_liftover(): variant coordinate liftover backed by bcftools +liftover using UCSC chain files, with full indel normalization, INFO/END lifting, and MT passthrough.rduckhts_bed() for BED3–BED12 interval files and rduckhts_fasta_nuc() for nucleotide composition over BED intervals or fixed-width bins.rduckhts_bgzip(), rduckhts_bgunzip(), rduckhts_bam_index(), rduckhts_bcf_index(), and rduckhts_tabix_index().rduckhts_hts_header(), rduckhts_hts_index(), rduckhts_hts_index_spans(), and rduckhts_hts_index_raw().rduckhts_bam() and rduckhts_fastq() (quality_representation, input_quality_encoding) and rduckhts_detect_quality_encoding() for heuristic FASTQ encoding detection.sequence_encoding := 'nt16' parameter to rduckhts_bam(), rduckhts_fasta(), and rduckhts_fastq() for raw htslib nt16 sequence output as UTINYINT[].sam_flag_bits() and sam_flag_has(), CIGAR utility functions, and is_forward_aligned().Conditionaly enable plugins in windows
Updates the configure script to avoid check faillure on CRAN MacOS
Update the extension version to 0.1.3
Different fixes for CRAN submission
modified configure to attemp to support wasm
Update bootstrapped extension code to match duckhts 0.1.2.
Add SAMtags + auxiliary tag support (standard_tags, auxiliary_tags).
Add tabix header/typing options (header, header_names, auto_detect, column_types).
CRAN Submission
Bump bundled duckhts extension version to 0.1.1.
Initial development release.
Bundles the DuckHTS DuckDB extension and htslib for HTS file readers.
Adds table-creation helpers for VCF/BCF, BAM/CRAM, FASTA/FASTQ, GFF/GTF, and tabix.