Feature/31243/leeds crc refactor#76
Conversation
bshand
left a comment
There was a problem hiding this comment.
I've requested a small performance-related change. Otherwise, this looks OK to me, though I don't really understand enough about the logic to easily follow the refactoring.
| @genes_hash = YAML.safe_load(File.open(Rails.root.join(GENES_FILEPATH))) | ||
| @status_hash = YAML.safe_load(File.open(Rails.root.join(STATUS_FILEPATH))) |
There was a problem hiding this comment.
I think these would be better populated in the #initialize method (i.e. once per batch), instead of reloading YAML file for every record. Maybe .freeze the result, if you want to be sure the lookup isn't mutated.
| genotype = record.raw_fields['genotype'].downcase.strip | ||
| mtype = record.raw_fields['moleculartestingtype'].downcase.strip | ||
| if genotype.match(/pms2\s-\smlpa\spred\snegative\sc5/i) || | ||
| genotype.match(/pms2\s-\sconf\smlpa\spositive\sc5/i) |
There was a problem hiding this comment.
I can't see this in the rules? Can you send me the updated ones please?
There was a problem hiding this comment.
Have mailed you the rules , and following -
<style> </style>| Raw:genotype | Test Scope |
|---|---|
| PMS2 - MLPA pred negative C5 | Targeted |
| PMS2 - Conf MLPA positive C5 | Targeted |
There was a problem hiding this comment.
Thank you! This makes sense now
| raw_report = report | ||
| exclude_statements = [ | ||
| 'Screening for mutations in MLH1, MSH2 and MSH6 is now in progress as requested.', | ||
| 'MLPA and MSH2 analysis was not requested.', |
There was a problem hiding this comment.
Would it be good to add these to the constants file instead?
There was a problem hiding this comment.
Done in latest commit .
|
|
||
| def add_variant_class(variant) | ||
| if variant.is_a?(Integer) && variant >= 1 && variant <= 5 | ||
| if variant.is_a?(Integer) && variant >= 1 && variant <= 7 |
There was a problem hiding this comment.
As this is an update to the assigning variant class in the genotype.rb file, have you checked that it won't affect any of the other importers?
There was a problem hiding this comment.
Its extending the range that's allowed to be in variantpathclass and adheres to Zvariantpathclass. If any other importer is able to extract c4/5 pathogenicity values and map upto 7 in future that should be fine.
| end | ||
| end | ||
|
|
||
| def allocate_genes(_record) |
There was a problem hiding this comment.
record is not being used by the function- can this be removed?
There was a problem hiding this comment.
Done in latest commit
|
Can you also add more comments please? |
Added now in latest commit at places and left for methods where name seems self explanatory, thanks @lauramccluskey1 ! |
| populate_and_persist_genotype(record) | ||
| end | ||
|
|
||
| # populate class variables that can be used over different methods |
There was a problem hiding this comment.
I think these are instance variables, may be worth changing the comment
There was a problem hiding this comment.
Thanks @ollietulloch , done in latest commit .
| end | ||
|
|
||
| # checks if file has colorectal tests and should be processed under this importer | ||
| def should_process |
There was a problem hiding this comment.
If this only returns true of false, could this be a predicate method?
There was a problem hiding this comment.
Thanks @ollietulloch , done in latest commit .
| def process_variants_from_record(genocolorectal, record) | ||
| genotypes = [] | ||
| allocate_genes | ||
| find_test_status(record) |
There was a problem hiding this comment.
A small thing but naming methods consistently is really helpful regarding readability. allocate_genes and find_test_status do similar things, i.e. setting an instance variable, so perheps calling it allocate_test_status might be more consistent?
There was a problem hiding this comment.
Thanks @ollietulloch , done in latest commit .
| @teststatus | ||
| end | ||
|
|
||
| def exceptinal_teststatus |
There was a problem hiding this comment.
Should this be exceptional_teststatus?
There was a problem hiding this comment.
Thanks @ollietulloch , done in latest commit .
| @teststatus | ||
| end | ||
|
|
||
| def exceptinal_teststatus |
There was a problem hiding this comment.
Can you update to "exceptional" please?
| else | ||
| @logger.debug 'Cannot determine test status for : '\ | ||
| "#{record.raw_fields['report']} a priori" | ||
| def process_normal_targ(genocolorectal, _record, genotypes) |
There was a problem hiding this comment.
record is not being used by this function- can it be removed?
| genotypes << genocolorectal | ||
| end | ||
|
|
||
| def process_abnormal_targ(genocolorectal, _record, genotypes) |
There was a problem hiding this comment.
record is not being used by this function- can it be removed?
| end | ||
| end | ||
|
|
||
| def process_normal_fs(genocolorectal, _record, genotypes) |
There was a problem hiding this comment.
record is not being used by this function- can it be removed?
|
|
||
| EXONIC_REPORT_REGEX = /(?<report>(#{GENES})\sexon(s)?[\w\s\-.>():=,&]+)/ix | ||
|
|
||
| PATHOGENIC_REPORT_REGEX = /(?<report>pathogenic\s(#{GENES})[\w\s\-.>():=,&]+)/ix |
There was a problem hiding this comment.
If you have "." as one of the options in the square brackets, do you need the rest of the options if "." includes all of the other characters you have listed?
| ex(on)?s?\s?(?<exons>[0-9]+(-[0-9]+)?)\s? | ||
| (?<variant>del|dup|ins)| | ||
| ex(on)?s?\s?(?<exons>[0-9]+\s?(\s?-\s?[0-9]+)?)\s? | ||
| (?<variant>del|dup|ins)?| |
There was a problem hiding this comment.
Are lines 60-61 similar to 58-59 but with a few extra optional values- could this be simplified?
| cdna_vars = get_cdna_mutations(report_variants) | ||
| if cdna_vars.size > 1 | ||
| proteins = report_variants.scan(PROTEIN_REGEX).flatten.uniq | ||
| cdna_vars.zip(proteins).each do |cdna_mutation, protein| |
There was a problem hiding this comment.
Is this using an assumption that the the order of the cdna will match the order of the protein in the sentence? What happens there isn't a protein listed for every cdna? Would that mean they get matched incorrectly?
| genotypes << genocolorectal_dup | ||
| end | ||
|
|
||
| def extract_targeted_genes |
There was a problem hiding this comment.
Check if this method is needed- as discussed on call
What?
I have refactored Leeds CRC importer with latest rules provided by Fiona , as per Planio ticket 35286
Why?
Whole Leeds CRC had to be re-written again as older code wasn't getting the right denominators and counts.
How?
Have re-written Leeds CRC importer as per rules sheet provided by Fiona.
Testing?
QA signed off by Fiona after sharing the counts. Also added handler tests for each kind of case.
Anything Else?
It still has rubocop errors for existing code in parent classes like genocolorectal.rb, which can't be edited in this scope of work as being used across all importers.