Prediction Workflow

The intention of the prediction workflow is to use a variety of transcript evidence, from short reads and long reads based gene assemblies, protein alignments, homology alignments and other evidence such as expression, introns and repeats to generate gene predictions ab initio and evidence based gene predictions.

Welcome to REAT
version - 0.6.0

Command-line call:
/home/docs/checkouts/readthedocs.org/user_builds/reat/envs/v0.6.0/bin/reat prediction --help


usage: reat prediction [-h] --genome GENOME --augustus_config_path
                       AUGUSTUS_CONFIG_PATH
                       [--extrinsic_config [EXTRINSIC_CONFIG [EXTRINSIC_CONFIG ...]]]
                       --species SPECIES [--chunk_size CHUNK_SIZE]
                       [--overlap_size OVERLAP_SIZE]
                       [--transcriptome_models [TRANSCRIPTOME_MODELS [TRANSCRIPTOME_MODELS ...]]]
                       [--homology_models [HOMOLOGY_MODELS [HOMOLOGY_MODELS ...]]]
                       [--introns INTRONS]
                       [--firststrand_expression FIRSTSTRAND_EXPRESSION]
                       [--secondstrand_expression SECONDSTRAND_EXPRESSION]
                       [--unstranded_expression UNSTRANDED_EXPRESSION]
                       [--repeats REPEATS] --homology_proteins
                       HOMOLOGY_PROTEINS [--optimise_augustus] [--kfold KFOLD]
                       [--force_train]
                       [--augustus_runs [AUGUSTUS_RUNS [AUGUSTUS_RUNS ...]]]
                       --EVM_weights EVM_WEIGHTS
                       [--hq_protein_alignments [HQ_PROTEIN_ALIGNMENTS [HQ_PROTEIN_ALIGNMENTS ...]]]
                       [--lq_protein_alignments [LQ_PROTEIN_ALIGNMENTS [LQ_PROTEIN_ALIGNMENTS ...]]]
                       [--hq_assembly [HQ_ASSEMBLY [HQ_ASSEMBLY ...]]]
                       [--lq_assembly [LQ_ASSEMBLY [LQ_ASSEMBLY ...]]]
                       [--mikado_utr_files [{gold,silver,bronze,all,hq_assembly,lq_assembly} [{gold,silver,bronze,all,hq_assembly,lq_assembly} ...]]]
                       [--do_glimmer [DO_GLIMMER]] [--do_snap [DO_SNAP]]
                       [--do_codingquarry [DO_CODINGQUARRY]] [--no_augustus]
                       [--filter_top_n FILTER_TOP_N]
                       [--filter_max_identity FILTER_MAX_IDENTITY]
                       [--filter_max_coverage FILTER_MAX_COVERAGE]
                       [--codingquarry_extra_params CODINGQUARRY_EXTRA_PARAMS]
                       [--glimmer_extra_params GLIMMER_EXTRA_PARAMS]
                       [--snap_extra_params SNAP_EXTRA_PARAMS]
                       [--augustus_extra_params AUGUSTUS_EXTRA_PARAMS]
                       [--evm_extra_params EVM_EXTRA_PARAMS]
                       [--min_train_models MIN_TRAIN_MODELS]
                       [--max_train_models MAX_TRAIN_MODELS]
                       [--max_test_models MAX_TEST_MODELS]
                       [--target_mono_exonic_percentage TARGET_MONO_EXONIC_PERCENTAGE]
                       [--force_train_few_models]
                       [--evalue_filter EVALUE_FILTER]
                       [--min_pct_cds_fraction MIN_PCT_CDS_FRACTION]
                       [--max_tp_utr_complete MAX_TP_UTR_COMPLETE]
                       [--max_tp_utr MAX_TP_UTR] [--min_tp_utr MIN_TP_UTR]
                       [--max_fp_utr_complete MAX_FP_UTR_COMPLETE]
                       [--max_fp_utr MAX_FP_UTR] [--min_fp_utr MIN_FP_UTR]
                       [--query_start_hard_filter_distance QUERY_START_HARD_FILTER_DISTANCE]
                       [--query_start_score QUERY_START_SCORE]
                       [--query_start_scoring_distance QUERY_START_SCORING_DISTANCE]
                       [--query_end_hard_filter_distance QUERY_END_HARD_FILTER_DISTANCE]
                       [--query_end_score QUERY_END_SCORE]
                       [--query_end_scoring_distance QUERY_END_SCORING_DISTANCE]
                       [--target_start_hard_filter_distance TARGET_START_HARD_FILTER_DISTANCE]
                       [--target_start_score TARGET_START_SCORE]
                       [--target_start_scoring_distance TARGET_START_SCORING_DISTANCE]
                       [--target_end_hard_filter_distance TARGET_END_HARD_FILTER_DISTANCE]
                       [--target_end_score TARGET_END_SCORE]
                       [--target_end_scoring_distance TARGET_END_SCORING_DISTANCE]
                       [--min_query_coverage_hard_filter MIN_QUERY_COVERAGE_HARD_FILTER]
                       [--min_query_coverage_score MIN_QUERY_COVERAGE_SCORE]
                       [--min_query_coverage_scoring_percentage MIN_QUERY_COVERAGE_SCORING_PERCENTAGE]
                       [--min_target_coverage_hard_filter MIN_TARGET_COVERAGE_HARD_FILTER]
                       [--min_target_coverage_score MIN_TARGET_COVERAGE_SCORE]
                       [--min_target_coverage_scoring_percentage MIN_TARGET_COVERAGE_SCORING_PERCENTAGE]
                       [--max_single_gap_hard_filter MAX_SINGLE_GAP_HARD_FILTER]
                       [--max_single_gap_score MAX_SINGLE_GAP_SCORE]
                       [--max_single_gap_scoring_length MAX_SINGLE_GAP_SCORING_LENGTH]

optional arguments:
  -h, --help            show this help message and exit
  --genome GENOME       Genome fasta file (default: None)
  --augustus_config_path AUGUSTUS_CONFIG_PATH
                        Template path for augustus config, this path will not be modified as a copy will be created internally for the workflow's use (default: None)
  --extrinsic_config [EXTRINSIC_CONFIG [EXTRINSIC_CONFIG ...]]
                        Augustus extrinsic configuration file, defines the boni/mali for each type of feature-evidence combination (default: None)
  --species SPECIES     Name of the species to train models for, if it does not exist in the augustus config path it will be created. (default: None)
  --chunk_size CHUNK_SIZE
                        Maximum length of sequence to be processed by Augustus or EVM (default: 3000000)
  --overlap_size OVERLAP_SIZE
                        Overlap length for sequences longer than chunk_size for EVM and Augustus (default: 100000)
  --transcriptome_models [TRANSCRIPTOME_MODELS [TRANSCRIPTOME_MODELS ...]]
                        Models derived from transcriptomic data (default: None)
  --homology_models [HOMOLOGY_MODELS [HOMOLOGY_MODELS ...]]
                        Models derived from protein alignments (default: None)
  --introns INTRONS     Introns to be used as hints for Augustus (default: None)
  --firststrand_expression FIRSTSTRAND_EXPRESSION
                        Sorted by position first-strand RNASeq alignments used for coverage hints (default: None)
  --secondstrand_expression SECONDSTRAND_EXPRESSION
                        Sorted by position second-strand RNAseq alignments used for coverage hints (default: None)
  --unstranded_expression UNSTRANDED_EXPRESSION
                        Sorted by position unstranded RNAseq alignments used for coverage hints (default: None)
  --repeats REPEATS     Repeat annotation GFF file. (default: None)
  --homology_proteins HOMOLOGY_PROTEINS
                        Protein DB of sequences used for determining whether the evidence provided is full-length or not (default: None)
  --optimise_augustus   Enable augustus metaparameter optimisation (default: False)
  --kfold KFOLD         Number of batches for augustus optimisation (default: 8)
  --force_train         Re-train augustus even if the species is found in the 'augustus_config_path' (default: False)
  --augustus_runs [AUGUSTUS_RUNS [AUGUSTUS_RUNS ...]]
                        File composed of 13 lines with SOURCE PRIORITY pairs for each of the types of evidence that can be used in
                        an Augustus run. These evidence types are: gold models, silver models, bronze models, all models,
                        gold introns, silver introns, protein models, coverage hints, repeat hints, high quality assemblies,
                        low quality assemblies, high quality proteins, and low quality proteins. (default: None)
  --EVM_weights EVM_WEIGHTS
                        Evidence modeler requires a weighting to be provided for each source of evidence, this file is the means to do so. (default: None)
  --hq_protein_alignments [HQ_PROTEIN_ALIGNMENTS [HQ_PROTEIN_ALIGNMENTS ...]]
                        High confidence protein alignments to be used as hints for Augustus runs (default: None)
  --lq_protein_alignments [LQ_PROTEIN_ALIGNMENTS [LQ_PROTEIN_ALIGNMENTS ...]]
                        Low confidence protein alignments to be used as hints for Augustus runs (default: None)
  --hq_assembly [HQ_ASSEMBLY [HQ_ASSEMBLY ...]]
                        High confidence assemblies (for example from HiFi source) to be used as hints for Augustus runs (default: None)
  --lq_assembly [LQ_ASSEMBLY [LQ_ASSEMBLY ...]]
                        Low confidence assemblies (short reads or low quality long reads) to be used as hints for Augustus runs (default: None)
  --mikado_utr_files [{gold,silver,bronze,all,hq_assembly,lq_assembly} [{gold,silver,bronze,all,hq_assembly,lq_assembly} ...]]
                        Choose any combination of space separated values from: gold silver bronze all hq_assembly lq_assembly (default: ['gold', 'silver'])
  --do_glimmer [DO_GLIMMER]
                        Enables GlimmerHmm predictions, optionally accepts a training directory (default: None)
  --do_snap [DO_SNAP]   Enables SNAP predictions, optionally accepts a training directory (default: None)
  --do_codingquarry [DO_CODINGQUARRY]
                        Enables CodingQuarry predictions, optionally accepts a training directory (default: None)
  --no_augustus
  --filter_top_n FILTER_TOP_N
                        Only output the top N transcripts that pass the self blast filter (0 outputs all) (default: 0)
  --filter_max_identity FILTER_MAX_IDENTITY
                        Maximum identity between models for redundancy classification (default: 80)
  --filter_max_coverage FILTER_MAX_COVERAGE
                        Maximum coverage between models for redundancy classification (default: 80)
  --codingquarry_extra_params CODINGQUARRY_EXTRA_PARAMS
                        Extra parameters for CodingQuarry predictions (default: None)
  --glimmer_extra_params GLIMMER_EXTRA_PARAMS
                        Extra parameters for glimmer predictions (default: None)
  --snap_extra_params SNAP_EXTRA_PARAMS
                        Extra parameters for snap predictions (default: None)
  --augustus_extra_params AUGUSTUS_EXTRA_PARAMS
                        Extra parameters for all Augustus predictions (default: None)
  --evm_extra_params EVM_EXTRA_PARAMS
                        Extra parameters for EVM gene predictions consolidation (default: None)
  --min_train_models MIN_TRAIN_MODELS
                        Minimum number of training models (default: 400)
  --max_train_models MAX_TRAIN_MODELS
                        Maximum number of training models (default: 1000)
  --max_test_models MAX_TEST_MODELS
                        Maximum number of test models (default: 200)
  --target_mono_exonic_percentage TARGET_MONO_EXONIC_PERCENTAGE
                        Target percentage of mono-exonic models in the training set (default: 20)
  --force_train_few_models
                        Train Augustus regardless monoexonic model ratio and number of models (default: False)
  --evalue_filter EVALUE_FILTER
                        Hits with a value higher than this will be filtered out of the initial set (default: 1e-06)
  --min_pct_cds_fraction MIN_PCT_CDS_FRACTION
                        Transcript requirement for minimum faction of transcript covered by CDS (default: 0.5)
  --max_tp_utr_complete MAX_TP_UTR_COMPLETE
                        Transcript requirement for maximum complete 3' UTRs (default: 1)
  --max_tp_utr MAX_TP_UTR
                        Transcript requirement for maximum number of 3' UTRs (default: 2)
  --min_tp_utr MIN_TP_UTR
                        Transcript requirement for minimum number of 3' UTRs (default: 1)
  --max_fp_utr_complete MAX_FP_UTR_COMPLETE
                        Transcript requirement for maximum number of complete 5' UTRs (default: 2)
  --max_fp_utr MAX_FP_UTR
                        Transcript requirement for maximum number of 5' UTRs (default: 3)
  --min_fp_utr MIN_FP_UTR
                        Transcript requirement for minimum number of 5' UTRs (default: 1)
  --query_start_hard_filter_distance QUERY_START_HARD_FILTER_DISTANCE
                        If query hit starts after this value, the transcript cannot belong to the Gold category (default: 10)
  --query_start_score QUERY_START_SCORE
                        Maximum score for query start distance (default: 5)
  --query_start_scoring_distance QUERY_START_SCORING_DISTANCE
                        Hits with query start distance lower than this parameter start receiving scoring points (default: 30)
  --query_end_hard_filter_distance QUERY_END_HARD_FILTER_DISTANCE
                        If query hit ends after this value, the transcript cannot belong to the Gold category (default: 10)
  --query_end_score QUERY_END_SCORE
                        Maximum score for query end distance (default: 5)
  --query_end_scoring_distance QUERY_END_SCORING_DISTANCE
                        Hits with query end distance lower than this parameter start receiving scoring points (default: 30)
  --target_start_hard_filter_distance TARGET_START_HARD_FILTER_DISTANCE
                        If target hit starts after this value, the transcript cannot belong to the Gold category (default: 10)
  --target_start_score TARGET_START_SCORE
                        Maximum score for target start distance (default: 5)
  --target_start_scoring_distance TARGET_START_SCORING_DISTANCE
                        Hits with target start distance lower than this parameter start receiving scoring points (default: 30)
  --target_end_hard_filter_distance TARGET_END_HARD_FILTER_DISTANCE
                        If target hit ends after this value, the transcript cannot belong to the Gold category (default: 10)
  --target_end_score TARGET_END_SCORE
                        Maximum score for target end distance (default: 5)
  --target_end_scoring_distance TARGET_END_SCORING_DISTANCE
                        Hits with target end distance lower than this parameter start receiving scoring points (default: 30)
  --min_query_coverage_hard_filter MIN_QUERY_COVERAGE_HARD_FILTER
                        Minimum percentage of query covered to classify a hit as Gold (default: 90)
  --min_query_coverage_score MIN_QUERY_COVERAGE_SCORE
                        Maximum score for query percentage coverage (default: 5)
  --min_query_coverage_scoring_percentage MIN_QUERY_COVERAGE_SCORING_PERCENTAGE
                        Queries covered over this percentage value start receiving scoring points (default: 30)
  --min_target_coverage_hard_filter MIN_TARGET_COVERAGE_HARD_FILTER
                        Minimum percentage of target covered to classify a hit as Gold (default: 90)
  --min_target_coverage_score MIN_TARGET_COVERAGE_SCORE
                        Maximum score for target percentage coverage (default: 5)
  --min_target_coverage_scoring_percentage MIN_TARGET_COVERAGE_SCORING_PERCENTAGE
                        Targets covered over this percentage value start receiving scoring points (default: 30)
  --max_single_gap_hard_filter MAX_SINGLE_GAP_HARD_FILTER
                        Any hits containing gaps larger than this parameter cannot be classified as Gold (default: 20)
  --max_single_gap_score MAX_SINGLE_GAP_SCORE
                        Maximum score for hits with single gaps smaller than --max_single_gap_scoring_length (default: 5)
  --max_single_gap_scoring_length MAX_SINGLE_GAP_SCORING_LENGTH
                        Hits containing gaps smaller than this parameter start receiving scoring points (default: 30)

The prediction module takes as input a genome file along with a set of evidences for annotations over the genome (these should have gene->mrna->{exon,CDS} structure, where CDS is required for protein inputs), these can come from homology proteins or transcript alignments, rna-seq gene models, repeat annotations, rna-seq alignments which can provide evidence to the presence/absence of exons. Also, the user should provide a set of proteins to validate against, these proteins are used to score input models, categorize them into Gold, Silver or Bronze and select the best models for training of the ab initio gene predictors.

Multiple sets of input models from homology proteins or transcriptomic sources are aligned to a protein database of the user’s choice and the results of these alignments are used to classify and score each input model into Bronze, Silver and Gold. Models from the Gold and Silver category are defined by:

  • Having complete but not excessively long UTR’s.

  • Being fully covered by multiple proteins from the database.

  • Having a long enough CDS, where the length is user defined.

For scoring the models, a score is calculated for the following properties: the distance between the start and end of the model and the target protein are compared to the start and end of the alignment; the coverage of the model and the target protein; and the length of the longest gap in the alignment. The score is defined by three parameters that are user controlled, a ‘hard filter’ after which the criteria is considered failed, a ‘soft filter’ from where alignments receive a score relative to the difference between the ‘best’ possible value and the current level, and finally the maximum possible score.

Once models have been scored, models with more than a coverage and identity user defined threshold (80% by default) are filtered. From the similarity filtered models, a user defined number of models at a user defined ratio between mono-exonic and multi-exonic are randomly selected to train ab initio predictors. These models are selected from the classified models ordered by ‘category’ (Gold, Silver, Bronze, others in this order) and score (highest to lowest).

Each of the ab initio predictors the user selected is then trained and used to generate predictions. In the case of Augustus, there is an initial ab initio prediction made with limited evidence, but further rounds of prediction with different weights for each evidence type can then be configured using a file containing a SOURCE and a SCORE value for each criteria (see). These parameters depend on the extrinsic information configuration file used by Augustus, for more information about REAT’s default see the following section. All these predictions are then combined using Evidence Modeler with configurable weights for each type of prediction and evidence (see). Finally, the EVM output is processed through Mikado using the Gold and Silver category models (which contain UTRs) to add UTRs where evidence supports it.

Note

The EVM weights file should contain a line per prediction, in case of --augustus_runs there should be a line with a label and a weight for each Augustus run, the labels are fixed and have the form AUGUSTUS_RUN# where # corresponds to the position of the run file in the list of --augustus_runs provided through the command-line arguments.

An example weights file with three augustus runs would look like this:

. OTHER_PREDICTION AUGUSTUS_RUN1 1 OTHER_PREDICTION AUGUSTUS_RUN2 1 OTHER_PREDICTION AUGUSTUS_RUN3 1 . . .

Configuring Augustus runs

When generating predictions using Augustus, we need to choose the weight parameters for each type of evidence, whilst at the same time possibly wanting to have multiple options of weight sets and priorities as to predict a comprehensive set of models that will maximise our chances of predicting correct structures. In REAT we can decide the number of Augustus predictions and the weights for each prediction using a configuration file per prediction. This file contains a pair of SOURCE and SCORE for each of the evidence types available, which are: gold models, silver models, bronze models, all models, gold introns, silver introns, protein models, coverage hints, repeat hints, high quality assemblies, low quality assemblies, high quality proteins, and low quality proteins. Each file provided to the --augustus_runs parameter will trigger a run of Augustus using the specific combination of weights and priorities defined for each evidence type, resulting in as many predictions as files provided.

Note

The output directory will contain a file of predictions corresponding to each --augustus_runs input files, these files are named augustus_run# where # corresponds to the position of the file in the command-line argument list of run files.

The default Augustus configuration file can be overridden to make available for the user different ‘SOURCE’s which can then be used for the --augustus_runs files, the following is an example of a ‘run’ file:

M 10
F 9
E 8
E 7
E 6
E 4
P 4
W 3
RM 1
E 2
E 2
E 2
E 2

Note

The order of the features in this file is as follows:
  • gold models

  • silver models

  • bronze models

  • all models

  • gold introns

  • silver introns

  • protein models

  • coverage hints

  • repeat hints

  • high quality assemblies

  • low quality assemblies

  • high quality proteins

  • low quality proteins

Extrinsic information configuration file

# extrinsic information configuration file for AUGUSTUS
# Mario Stanke (mstanke@gwdg.de)
# David Swarbreck (david.swarbreck@earlham.ac.uk)
# Gemy Kaithakottil (gemy.kaithakottil@earlham.ac.uk)


# source of extrinsic information:
# M manual anchor (required)
# P protein database hit
# E EST/cDNA database hit
# W wiggle track coverage info from RNA-Seq
# RM repeat hint

[SOURCES]
M RM F E P W

[SOURCE-PARAMETERS]
P individual_liability
E individual_liability
W individual_liability
F individual_liability
M individual_liability

[GENERAL]
      start        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
       stop        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
        tss        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
        tts        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
        ass        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
        dss        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
   exonpart        1   .992     M    1  1e+100  RM  1     1    F 1    1e4    E 1    1e2    P 1    1    W 1    1.005
       exon        1   1        M    1  1e+100  RM  1     1    F 1    1e8    E 1    1e4    P 1    1    W 1    1
 intronpart        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
     intron        1   0.01     M    1  1e+100  RM  1     1    F 1    1e8    E 1    1e4    P 1    1e4  W 1    1
    CDSpart        1   1 0.985  M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1e2  W 1    1
        CDS        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1e4  W 1    1
    UTRpart        1   1 0.985  M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
        UTR        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
     irpart        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1
nonexonpart        1   1        M    1  1e+100  RM  1     10   F 1    1      E 1    1      P 1    1    W 1    1
  genicpart        1   1        M    1  1e+100  RM  1     1    F 1    1      E 1    1      P 1    1    W 1    1

#
# Explanation: 
# 
# Please refer to the AUGUSTUS documentation for further details about this file

Evidence Modeler default weights file

ABINITIO_PREDICTION     GlimmerHMM      1
ABINITIO_PREDICTION     SNAP    1
ABINITIO_PREDICTION     CodingQuarry_v2.0       1
ABINITIO_PREDICTION AUGUSTUS_RUN_ABINITIO 1
PROTEIN hq_protein_alignment    4
PROTEIN lq_protein_alignment    1
TRANSCRIPT      hq_assembly     4
TRANSCRIPT      lq_asssembly    1
TRANSCRIPT      homology_models 10
TRANSCRIPT      transcriptome_models    10
OTHER_PREDICTION        AUGUSTUS_RUN1   10
OTHER_PREDICTION        AUGUSTUS_RUN2   10
OTHER_PREDICTION        AUGUSTUS_RUN3   10

Configurable computational resources available

      "ei_prediction.AlignProteins.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.Augustus.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.AugustusAbinitio.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.ExecuteEVMCommand.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.IndexProteinsDatabase.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.LengthChecker.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.Mikado.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.MikadoPick.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)",
"ei_prediction.SelfBlastFilter.resources": " {
               cpu_cores -> Int
              max_retries -> Int?
              boot_disk_gb -> Int?
              queue -> String?
              disk_gb -> Int?
              constraints -> String?
              mem_gb -> Float?
              preemptible_tries -> Int?
              }? (optional)"
Prediction workflow diagram