Skip to content

Feature/35869/st georges importer#77

Merged
shilpigoeldev merged 69 commits into
developfrom
feature/35869/st_georges_importer
Jun 21, 2024
Merged

Feature/35869/st georges importer#77
shilpigoeldev merged 69 commits into
developfrom
feature/35869/st_georges_importer

Conversation

@lauramccluskey1

Copy link
Copy Markdown
Contributor

What?
A new St Georges importer has been written to account for a new data format (see planio ticket 35869). The old St Georges importer is still needed, so it has been kept in the repository.

Why?
St Georges changed the format of the data being sent. Therefore, a new importer had to be created to be able to parse this format.

How?
The importer was written based on the rules sent by Fiona.

Testing?
Unittests have been written. A QA of the variant counts has also been completed and signed off by Fiona.

genotypes.append(genotype)
end
end
genotype.add_test_scope(:no_genetictestscope)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown scope has been added here @lauramccluskey1 but not considered while processing the records ? Should you be having an else condition for these cases to be processed in fill_genotypes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great spot @shilpigoeldev! I have just spoke to Fiona about this as its not clear what to do in this instance for st georges. She said that as there were no cases with no genetictestscope, this should give a big error instead. This means that if we see any cases in future new rules can be created. So I will add this as a logger message. @NImeson this applies to colorectal too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logger error added in 9104764

process_variants(single_genotype, record)
@persister.integrate_and_store(single_genotype)
end
genotypes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you need this @lauramccluskey1 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed it to be able to test the unittests for that function

genotype.attribute_map['organisationcode_testresult'] = '697N0'
end
# records using new importer should only have SRIs starting with V
return unless record.raw_fields['servicereportidentifier'].start_with?('V')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return statement should be the first one, no need to create genotype object if we are not processing file @lauramccluskey1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 08d5202

genotype.add_status(1)
# variant dna is not '*Fail*', 'N' or null AND raw:gene is not null AND raw:gene (other) is null
# 2 (abnormal) for gene in raw:gene. 1 (normal) for all other genes.
elsif record.raw_fields['gene'].present? \

@shilpigoeldev shilpigoeldev Jun 13, 2024

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need backslashes for continuation here (and throughout the file) , it can be written as -

record.raw_fields['gene'].present? &&
record.raw_fields['gene (other)'].blank?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the backslashes my tests fail and I get "syntax error, unexpected &&"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lauramccluskey1 Did you move && operator to first line ?, below will work

def interrogate_variant_dna_column(record, genotype, genes, column, gene)
            # For full screen tests only- add test status when variant dna column is not empty
            if record.raw_fields['variant dna'].match(/Fail/ix)
              genotype.add_status(9)
            elsif record.raw_fields['variant dna'] == 'N'
              genotype.add_status(1)
            # variant dna is not '*Fail*', 'N' or null AND raw:gene is not null AND raw:gene (other) is null
            # 2 (abnormal) for gene in raw:gene. 1 (normal) for all other genes.
            elsif record.raw_fields['gene'].present? && record.raw_fields['gene (other)'].blank?
              update_status(2, 1, column, 'gene', genotype)
            # variant dna is not '*Fail*', 'N' or null AND raw:gene is not null AND raw:gene (other) is not null
            # 2 (abnormal) for gene in raw:gene.
            # 9 (failed, genetic test) for any gene specified WITH 'Fail' in raw:gene (other).
            # 1 (normal) for all other genes
            elsif record.raw_fields['gene'].present? && record.raw_fields['gene (other)'].present?
              if column == 'gene'
                genotype.add_status(2)
              elsif column == 'gene (other)'
                match_fail(gene, record, genotype)
              else
                genotype.add_status(1)
              end
            # variant dna not '*Fail*', 'N' or null AND raw:gene is null AND raw:gene(other) not a single gene
            # If gene is specified in raw:variant dna, assign 2 (abnormal) for that gene and 1 (normal) for
            # all other genes.
            # Else interrogate raw:gene (other).
            elsif record.raw_fields['gene'].blank? && (genes['gene (other)'].blank? ||
                  genes['gene (other)'].length > 1) && !genes['variant dna'].nil? && 
                  genes['variant dna'].length >= 1
              update_status(2, 1, column, 'variant dna', genotype)
            # variant dna is not '*Fail*', 'N' or null AND raw:gene is null AND raw:gene (other) specifies a single gene
            # 2 (abnormal) for gene in raw:gene (other). 1 (normal) for all other genes.
            elsif record.raw_fields['gene'].blank? && !genes['gene (other)'].nil? && genes['gene (other)'].length == 1
              update_status(2, 1, column, 'gene (other)', genotype)
            end
          end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shilpigoeldev - updated in 960d753

def positive_exonvariant?(record)
variant = record.raw_fields['genotype']
variant.scan(EXON_VARIANT_REGEX).size.positive?
def match_fail(gene, record, genotype)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method can be refactored -

def match_fail(gene, record, genotype)
  # Determines if a gene in the gene (other) column has failed
  # Assigns genes that have failed a test status of 9, otherwise teststatus is 1

  gene_list = record.raw_fields['gene (other)'].scan(BRCA_GENE_REGEX)
  return false if gene_list.empty?

  gene_list.each do |gene_value|
    mapped_gene_values = BRCA_GENE_MAP[gene_value] || []

    mapped_gene_values.each do |value|
      if value == gene
        status = /#{gene_value}\s?\(?fail\)?/i.match?(record.raw_fields['gene (other)']) ? 9 : 1
        genotype.add_status(status)
      end
    end
  end

  true
end

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 08d5202

@shilpigoeldev shilpigoeldev marked this pull request as draft June 19, 2024 11:58
@shilpigoeldev shilpigoeldev marked this pull request as ready for review June 20, 2024 10:14
@shilpigoeldev shilpigoeldev requested a review from NImeson June 20, 2024 10:14
process_failed_gene_other(genes, genotype, genotypes, remaining_genes)
when /\?\z/i
process_status_genes(genotype, 4, genes['gene'], genotypes)
when /^c\.|^Ex.*Del\z|^Ex.*Dup\z|^Het\sDel|^Het\sDup/ix

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As confirmed by Fiona, regex needs to account for e.g. 'Hetdel' entries as well, it's currently assigning this as test status 4 but needs to be 2.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NImeson , Updated and test added in add51ac

@shilpigoeldev shilpigoeldev requested a review from NImeson June 21, 2024 12:54

@NImeson NImeson left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good :-)

@shilpigoeldev shilpigoeldev merged commit 6e1a6c9 into develop Jun 21, 2024
@shilpigoeldev shilpigoeldev deleted the feature/35869/st_georges_importer branch June 21, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants