Check Documentation

chalicelib.checks.audit_checks.check_validation_errors(connection, **kwargs): Counts number of items in fourfront with schema validation errors, returns link to search if found.

chalicelib.checks.audit_checks.paired_end_info_consistent(connection, **kwargs): Check that fastqs with a paired_end number have a paired_with related_file, and vice versa

chalicelib.checks.es_checks.clean_s3_es_checks(connection, **kwargs): Cleans old checks from both s3 and es older than one month. Must be called from a specific check as it will take too long otherwise.

chalicelib.checks.es_checks.elasticsearch_s3_count_diff(connection, **kwargs): Reports the difference between the number of files on s3 and es

chalicelib.checks.es_checks.migrate_checks_to_es(connection, **kwargs): Migrates checks from s3 to es. If a check name is given only those checks will be migrated

chalicelib.checks.system_checks.check_long_running_ec2s(connection, **kwargs): Flag all ec2s that have been running for longer than 1 week (WARN) or 2 weeks (FAIL) if any contain any strings from flag_names in their names, or if they have no name.

chalicelib.checks.system_checks.elastic_beanstalk_health(connection, **kwargs): Check both environment health and health of individual instances

chalicelib.checks.system_checks.elastic_search_space(connection, **kwargs): Checks that our ES nodes all have a certain amount of space remaining

chalicelib.checks.system_checks.scale_down_elasticsearch_production(connection, **kwargs)

Scales down Elasticsearch (production configuration). HOT (0600 to 2000 EST):

Master:
3x c5.large.elasticsearch

Data:
2x c5.2xlarge.elasticsearch

COLD (2000 to 0600 EST): This is what we are resizing to

Master:: None
Data:: 3x c5.xlarge.elasticsearch

XXX: should probably use constants in ElasticSearchServiceClient For now, must be explicitly triggered - but should be put on a schedule.

chalicelib.checks.system_checks.scale_up_elasticsearch_production(connection, **kwargs)

Scales up Elasticsearch (production configuration). HOT (0600 to 2000 EST): This is what we are resizing to

Master:
3x c5.large.elasticsearch

Data:
2x c5.2xlarge.elasticsearch

COLD (2000 to 0600 EST):

Master:: None
Data:: 2x c5.large.elasticsearch

XXX: should probably use constants in ElasticSearchServiceClient For now, must be explicitly triggered - but should be put on a schedule.

chalicelib.checks.system_checks.wipe_cgap_build_indices(connection, **kwargs): Wipes build indices for CGAP (on cgap-testing)

chalicelib.checks.wrangler_checks.add_suggested_enum_values(connection, **kwargs): No action is added yet, this is a placeholder for automated pr that adds the new values.

chalicelib.checks.wrangler_checks.check_external_references_uri(connection, **kwargs): Check if external_references.uri is missing while external_references.ref is present.

chalicelib.checks.wrangler_checks.check_for_ontology_updates(connection, **kwargs)

Checks for updates in one of the three main ontologies that the 4DN data portal uses: EFO, UBERON, and OBI. EFO: checks github repo for new releases and compares release tag. Release tag is a semantic version number starting with ‘v’. OBI: checks github repo for new releases and compares release tag. Release tag is a ‘v’ plus the release date. UBERON: github site doesn’t have official ‘releases’ (and website isn’t properly updated), so checks for commits that have a commit message containing ‘new release’

If version numbers to compare against aren’t specified in the UI, it will use the ones from the previous primary check result.

chalicelib.checks.wrangler_checks.check_opf_lab_different_than_experiment(connection, **kwargs): Check if other processed files have lab (generating lab) that is different than the lab of that generated the experiment. In this case, the experimental lab needs to be added to the opf (contributing lab).

chalicelib.checks.wrangler_checks.check_suggested_enum_values(connection, **kwargs)

On our schemas we have have a list of suggested fields for suggested_enum tagged fields. A value that is not listed in this list can be accepted, and with this check we will find all values for each suggested enum field that is not in this list. There are 2 functions below:

find_suggested_enum

This functions takes properties for a item type (taken from /profiles/) and goes field by field, looks for suggested enum lists, and is also recursive for taking care of sub-embedded objects (tagged as type=object). Additionally, it also takes ignored enum lists (enums which are not suggested, but are ignored in the subsequent search).

after running this function, we construct a search url for each field,

where we exclude all values listed under suggested_enum (and ignored_enum) from the search: i.e. if it was FileProcessed field ‘my_field’ with options [val1, val2], url would be: /search/?type=FileProcessed&my_field!=val1&my_field!=val2&my_field!=No value

extract value

Once we have the search result for a field, we disect it (again for subembbeded items or lists) to extract the field value, and = count occurences of each new value. (i.e. val3:10, val4:15)

*deleted items are not considered by this check

chalicelib.checks.wrangler_checks.clone_cases(connection, **kwargs)

chalicelib.checks.wrangler_checks.core_project_status(connection, **kwargs)

Ensure CGAP Core projects have their objects shared.

Default behavior is to check only VariantSample objects, but defining ‘item_type’ in check_setup.json will override the default and check status for all objects defined there.

chalicelib.checks.wrangler_checks.get_metadata_for_cases_to_clone(connection, **kwargs)

chalicelib.checks.wrangler_checks.grouped_with_file_relation_consistency(connection, **kwargs): Check if “grouped with” file relationships are reciprocal and complete. While other types of file relationships are automatically updated on the related file, “grouped with” ones need to be explicitly (manually) patched on the related file. This check ensures that there are no related files that lack the reciprocal relationship, or that lack some of the group relationships (for groups larger than 2 files).

chalicelib.checks.wrangler_checks.queue_variants_to_update_genelist(connection, **kwargs)

Add variant samples to indexing queue to update gene lists.

Works with output of update_variant_genelist() above.

chalicelib.checks.wrangler_checks.share_core_project(connection, **kwargs)

Change CGAP Core project item status to shared.

Patches the status of the output of core_project_status above.

chalicelib.checks.wrangler_checks.update_variant_genelist(connection, **kwargs)

Searches for variant samples with genes in gene lists that are not currently embedded in the item, only for gene lists uploaded within a certain time frame (default is 1 day and ~30 minutes).

Because of reverse link from gene to gene list, variant samples are not invalidated upon addition of new gene list. This check and the associated action search through variant samples with genes belonging to recent gene lists and add them to the indexing queue if the gene lists are not embedded.

chalicelib.checks.wrangler_checks.validate_entrez_geneids(connection, **kwargs): query ncbi to see if geneids are valid

chalicelib.checks.wrangler_checks.workflow_run_has_deleted_input_file(connection, **kwargs): Checks all wfrs that are not deleted, and have deleted input files There is an option to compare to the last, and only report new cases (cmp_to_last) The full output has 2 keys, because we report provenance wfrs but not run action on them problematic_provenance: stores uuid of deleted file, and the wfr that is not deleted problematic_wfr: stores deleted file, wfr to be deleted, and its downstream items (qcs and output files)