Global Asm4pg assembly report for: QRobur_dtol_hic

Asm4pg Parameters

  • Hifiasm mode : hi-c

  • If hi-c or trio mode :

    • parent1/r1 clean_R1.fastq.gz
    • parent2/r2 clean_R2.fastq.gz
  • Hifiasm purge force : 3

  • Purge_dups executed: No

  • Genome scafolded: No

  • Busco lineage: eudicots_odb10

  • Genome ploidy: 2

  • Kmers size: 21

Raw data QC

Reads statistics

# number of contigs:     4566215
# total contigs length:  63260161666
# mean contig size:      13853.96
# contig size first quartile: 9294
# median contig size:         13919
# contig size third quartile: 17865
# longest contig:             53004
# shortest contig:            51
# contigs > 500 nt:           4563690 (99.94 %)
# contigs > 1K nt:            4562727 (99.92 %)
# contigs > 10K nt:           3282316 (71.88 %)
# contigs > 100K nt:          0 (0.00 %)
# contigs > 1M nt:            0 (0.00 %)
# N50:                   16322
# L50:                   1553029
# N80:                   11702
# L80:                   2901515

QC on final assembly

Assembly statistics

Hap 1

# number of contigs:     877
# total contigs length:  849314776
# mean contig size:      968431.90
# contig size first quartile: 35721
# median contig size:         45222
# contig size third quartile: 70000
# longest contig:             98921506
# shortest contig:            1000
# contigs > 500 nt:           877 (100.00 %)
# contigs > 1K nt:            871 (99.32 %)
# contigs > 10K nt:           863 (98.40 %)
# contigs > 100K nt:          96 (10.95 %)
# contigs > 1M nt:            13 (1.48 %)
# N50:                   62438176
# L50:                   6
# N80:                   54344288
# L80:                   10

Hap 2

# number of contigs:     112
# total contigs length:  809654976
# mean contig size:      7229062.29
# contig size first quartile: 34102
# median contig size:         50665
# contig size third quartile: 139789
# longest contig:             100233192
# shortest contig:            1000
# contigs > 500 nt:           112 (100.00 %)
# contigs > 1K nt:            111 (99.11 %)
# contigs > 10K nt:           107 (95.54 %)
# contigs > 100K nt:          33 (29.46 %)
# contigs > 1M nt:            14 (12.50 %)
# N50:                   67606007
# L50:                   5
# N80:                   55358563
# L80:                   10

K-mer profiles

Hap 1 Hap 2

K-mer completeness and error rate

Completeness

tmp_hap2    all 482294416   623596779   77.3407

Error rate

tmp_hap2    7439    809633296   63.5899 4.37531e-07

BUSCO score

Hap 1

# BUSCO version is: 5.7.1 
# The lineage dataset is: eudicots_odb10 (Creation date: 2024-01-08, number of genomes: 31, number of BUSCOs: 2326)
# Summarized benchmarking in BUSCO notation for file /home/lpiat/work/asm_article_benchmark/asm_hic/results/asm4pg_hic_results/02_final_assembly/hap1/asm4pg_hic_final_hap1.fasta
# BUSCO was run in mode: euk_genome_min
# Gene predictor used: miniprot

    ***** Results: *****

    C:98.4%[S:93.7%,D:4.7%],F:1.1%,M:0.5%,n:2326,E:2.6%    
    2290    Complete BUSCOs (C) (of which 60 contain internal stop codons)         
    2180    Complete and single-copy BUSCOs (S)    
    110 Complete and duplicated BUSCOs (D)     
    25  Fragmented BUSCOs (F)              
    11  Missing BUSCOs (M)             
    2326    Total BUSCO groups searched        

Assembly Statistics:
    877 Number of scaffolds
    1086    Number of contigs
    849314776   Total length
    0.002%  Percent gaps
    62 MB   Scaffold N50
    42 MB   Contigs N50


Dependencies and versions:
    hmmsearch: 3.1
    bbtools: 39.01
    miniprot_index: 0.13-r248
    miniprot_align: 0.13-r248
    python: sys.version_info(major=3, minor=7, micro=12, releaselevel='final', serial=0)
    busco: 5.7.1

Hap 2

# BUSCO version is: 5.7.1 
# The lineage dataset is: eudicots_odb10 (Creation date: 2024-01-08, number of genomes: 31, number of BUSCOs: 2326)
# Summarized benchmarking in BUSCO notation for file /home/lpiat/work/asm_article_benchmark/asm_hic/results/asm4pg_hic_results/02_final_assembly/hap2/asm4pg_hic_final_hap2.fasta
# BUSCO was run in mode: euk_genome_min
# Gene predictor used: miniprot

    ***** Results: *****

    C:98.6%[S:93.9%,D:4.7%],F:0.9%,M:0.5%,n:2326,E:2.4%    
    2294    Complete BUSCOs (C) (of which 56 contain internal stop codons)         
    2184    Complete and single-copy BUSCOs (S)    
    110 Complete and duplicated BUSCOs (D)     
    21  Fragmented BUSCOs (F)              
    11  Missing BUSCOs (M)             
    2326    Total BUSCO groups searched        

Assembly Statistics:
    112 Number of scaffolds
    274 Number of contigs
    809654976   Total length
    0.002%  Percent gaps
    67 MB   Scaffold N50
    39 MB   Contigs N50


Dependencies and versions:
    hmmsearch: 3.1
    bbtools: 39.01
    miniprot_index: 0.13-r248
    miniprot_align: 0.13-r248
    python: sys.version_info(major=3, minor=7, micro=12, releaselevel='final', serial=0)
    busco: 5.7.1

Telomeres

Telomeres present in assembly #### Hap 1

##########
877 sequences to analyze for telomeric repeats (TTAGGG/CCCTAA) in file /home/lpiat/work/asm_article_benchmark/asm_hic/results/asm4pg_hic_results/02_final_assembly/hap1/asm4pg_hic_final_hap1.fasta
##########

scaffold_1   Forward (start of sequence)     AACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA
scaffold_1   Reverse (end of sequence)   GTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGG
scaffold_2   Forward (start of sequence)     CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC
scaffold_2   Reverse (end of sequence)   GTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGG
scaffold_3   Forward (start of sequence)     CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC
scaffold_3   Reverse (end of sequence)   AGGGTTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTT
scaffold_4   Forward (start of sequence)     CCTAAACCCTAAACCCTATAACCCTAAACCCTAAACCCTAAACCCTAAAC
scaffold_4   Reverse (end of sequence)   TTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTT
scaffold_5   Reverse (end of sequence)   GGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGG
scaffold_6   Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
scaffold_7   Forward (start of sequence)     TAAACCCCTAAACCCTATTAAAGCCCTAAACCCTACCCAAAACCTAAACC
scaffold_7   Reverse (end of sequence)   TAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTT
scaffold_8   Forward (start of sequence)     CCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAC
scaffold_9   Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC
scaffold_9   Reverse (end of sequence)   GGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAG
scaffold_10      Forward (start of sequence)     CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC
scaffold_10      Reverse (end of sequence)   TTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTT
scaffold_11      Forward (start of sequence)     AACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAA
scaffold_11      Reverse (end of sequence)   TAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTT
scaffold_12      Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
scaffold_12      Reverse (end of sequence)   GTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGG
scaffold_14      Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTGAAACCCT
scaffold_19      Reverse (end of sequence)   GTTTAGGGTTTAGGGTTTAGGGGTTTAGGGTTAGGGTTTAGTAGGGTTGT

Telomeres found: 23 (12 forward, 11 reverse)

Hap 2

##########
112 sequences to analyze for telomeric repeats (TTAGGG/CCCTAA) in file /home/lpiat/work/asm_article_benchmark/asm_hic/results/asm4pg_hic_results/02_final_assembly/hap2/asm4pg_hic_final_hap2.fasta
##########

scaffold_1   Forward (start of sequence)     ACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAA
scaffold_1   Reverse (end of sequence)   TTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGT
scaffold_2   Forward (start of sequence)     CCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAC
scaffold_2   Reverse (end of sequence)   TAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTT
scaffold_3   Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
scaffold_3   Reverse (end of sequence)   GTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGG
scaffold_4   Reverse (end of sequence)   TTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTT
scaffold_5   Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
scaffold_5   Reverse (end of sequence)   GTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGG
scaffold_6   Forward (start of sequence)     CGCTAAACCCTAAACCCTAAACCCCTAAAACCCTAAACCCTAAACCCTAC
scaffold_6   Reverse (end of sequence)   TTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGT
scaffold_7   Forward (start of sequence)     CCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAC
scaffold_7   Reverse (end of sequence)   GGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAG
scaffold_8   Forward (start of sequence)     TAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCT
scaffold_8   Reverse (end of sequence)   GGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAG
scaffold_9   Forward (start of sequence)     CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC
scaffold_9   Reverse (end of sequence)   AGGGTTTAGGGTTTAGGTTTAGGGTTTAGGGTTTAGGGGTTTAGGGTTTA
scaffold_10      Forward (start of sequence)     AACCACCAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
scaffold_10      Reverse (end of sequence)   TAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTT
scaffold_11      Forward (start of sequence)     ACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAA
scaffold_11      Reverse (end of sequence)   GGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGG
scaffold_12      Forward (start of sequence)     CCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAC
scaffold_12      Reverse (end of sequence)   GGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTTT
scaffold_34      Forward (start of sequence)     TAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCT

Telomeres found: 24 (12 forward, 12 reverse)

Transposable element analysis

Hap 1

LTR recap

==================================================
file name: tmp_hap.fasta            
sequences:           877
total length:  849314776 bp  (849293876 bp excl N/X-runs)
GC level:         35.84 %
bases masked:  338617518 bp ( 39.87 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       655589    338617518 bp   39.87 %
   SINEs:                0            0 bp    0.00 %
   Penelope              0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:    655589    338617518 bp   39.87 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia       99659     48028055 bp    5.65 %
     Gypsy/DIRS1    100728     94037381 bp   11.07 %
       Retroviral        0            0 bp    0.00 %

DNA transposons          0            0 bp    0.00 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:            0            0 bp    0.00 %

Total interspersed repeats:   338617518 bp   39.87 %


Small RNA:               0            0 bp    0.00 %

Satellites:              0            0 bp    0.00 %
Simple repeats:          0            0 bp    0.00 %
Low complexity:          0            0 bp    0.00 %
==================================================

LAI

Chr From    To  Intact  Total   raw_LAI LAI
whole_genome    1   849293876   0.0671  0.3911  17.15   22.78

Hap 2

LTR recap

==================================================
file name: tmp_hap.fasta            
sequences:           112
total length:  809654976 bp  (809638776 bp excl N/X-runs)
GC level:         35.75 %
bases masked:  304230671 bp ( 37.58 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       597925    304230671 bp   37.58 %
   SINEs:                0            0 bp    0.00 %
   Penelope              0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:    597925    304230671 bp   37.58 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia      113613     56848832 bp    7.02 %
     Gypsy/DIRS1    107312     67044763 bp    8.28 %
       Retroviral        0            0 bp    0.00 %

DNA transposons          0            0 bp    0.00 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:            0            0 bp    0.00 %

Total interspersed repeats:   304230671 bp   37.58 %


Small RNA:               0            0 bp    0.00 %

Satellites:              0            0 bp    0.00 %
Simple repeats:          0            0 bp    0.00 %
Low complexity:          0            0 bp    0.00 %
==================================================

LAI

Chr From    To  Intact  Total   raw_LAI LAI
whole_genome    1   809638776   0.0694  0.3681  18.85   23.58