OUP user menu

Genetic data and electronic health records: a discussion of ethical, logistical and technological considerations

Kimberly Shoenbill , Norman Fost , Umberto Tachinardi , Eneida A Mendonca
DOI: http://dx.doi.org/10.1136/amiajnl-2013-001694 171-180 First published online: 1 January 2014


Objective The completion of sequencing the human genome in 2003 has spurred the production and collection of genetic data at ever increasing rates. Genetic data obtained for clinical purposes, as is true for all results of clinical tests, are expected to be included in patients' medical records. With this explosion of information, questions of what, when, where and how to incorporate genetic data into electronic health records (EHRs) have reached a critical point. In order to answer these questions fully, this paper addresses the ethical, logistical and technological issues involved in incorporating these data into EHRs.

Materials and methods This paper reviews journal articles, government documents and websites relevant to the ethics, genetics and informatics domains as they pertain to EHRs.

Results and discussion The authors explore concerns and tasks facing health information technology (HIT) developers at the intersection of ethics, genetics, and technology as applied to EHR development.

Conclusions By ensuring the efficient and effective incorporation of genetic data into EHRs, HIT developers will play a key role in facilitating the delivery of personalized medicine.

  • Electronic Health Records
  • Medical Informatics
  • Genomics
  • Individualized Medicine
  • Ethics, Medical


Initial sequencing of the human genome in 2003 launched a new era of genetic research that promised personalized medicine with improved diagnosis, prevention and treatment of disease (box 1). Before such promises can be fulfilled, many questions must be answered surrounding the incorporation of genetic data into electronic health records (EHRs). As noted by Lose,1 “even before the race to integrate genomic medicine into the daily routine begins, the initial question of what to disseminate remains.” A common misperception is that the field of genetics is advancing faster than ethical standards to help guide its use. Guidelines on ethically responsible genetic screening were proposed in 1975 by the Committee for the Study of Inborn Errors of Metabolism of the National Academy of Sciences, reiterated in an Institute of Medicine report in 1994, and similar guidelines promulgated by the World Health Organization and other groups.25 Despite wide support of these principles, in practice they are often not followed.6 This paper examines many of the same ethical and logistical problems anticipated by the 1975 committee/report and evaluates them in relation to technological issues faced in the current era of EHRs, genomic medicine and big data. This examination is based on a review of journal articles, government documents and websites addressing ethical, genetic, technology and infrastructure concerns as they relate to EHRs (figure 1). The rapid pace of genetic research necessitates urgent focus on integration solutions that minimize risk while maximizing benefit. Box 1 provides a glossary of relevant genetic and genomic terms.

Figure 1

Detailed description of search strategy and results.

Box 1


CDS (clinical decision support)—“a process for enhancing health-related decisions and actions with pertinent, organized clinical knowledge and patient information to improve health and healthcare delivery”7

Chromosome—the self-replicating structure of cells made of genes8

CLIA (clinical laboratory improvement amendments)—federally mandated oversight of the quality of laboratory tests to ensure the accuracy, reliability and timeliness of patient test results9

Clinical tests—“those [tests] in which specimens are examined and results reported to the provider for medical purposes, such as diagnosis, prevention or treatment in the care of individual patients”10

DICOM (digital imaging and communications in medicine)—a standard method for medical images and associated information to be exchanged between devices manufactured by various vendors11

Direct to consumer (DTC) genetic test—“genetic tests marketed and offered directly to consumers”10

DNA (deoxyribonucleic acid)—the chemical that contains instructions to direct cell activity8

Exome sequencing—determination of the order of base sequences in all the protein coding regions of an organism's genome8

Gene—the basic and functional units of heredity that are made of sequences of base pairs that provide instructions on how to make proteins8

Genetic tests—“the analysis of human DNA, RNA, chromosomes, proteins or certain metabolites in order to detect alterations related to a heritable disorder”12

Genome—an organism's complete set of DNA8

Genome wide association studies—large research projects that evaluate thousands of genomes looking for variants (usually single nucleotide polymorphisms) that are present in greater numbers in subjects with a given disorder than in subjects without a disorder13

Mendelian inheritance genetic test—“[a genetic test that] focuses on one or a few genes, which are chosen because of the medical and family history and are strongly associated with disease”14

Microarray—a device consisting of a glass slide and a sample of many separate DNA sequences arranged in a pattern. Messenger RNA (mRNA) will bind to the DNA from which it was originally made. mRNA is placed on the slide and the amount and types that are bound indicates what DNA (genes) are expressed in the sample.15

PACS (picture archiving and communication system)—a medical imaging system that provides efficient storage and access to medical images from multiple sources16

Personalized medicine—“an emerging practice of medicine that uses an individual's genetic profile to guide decisions made in regard to the prevention, diagnosis, and treatment of disease. Knowledge of a patient's genetic profile can help doctors select the proper medication or therapy and administer it using the proper dose or regimen.”17

Personal genomic screening—“a rapidly evolving area of DTC genetic testing which is based on multiple statistical comparisons, or genome-wide associations”14

Research tests—“those [tests] in which specimens are examined for the purpose of gaining fundamental scientific knowledge, or for early stage development of a clinical test”10

RNA (ribonucleic acid)—a chemical found in the nucleus of cells that codes for proteins and is studied to determine which genes in a cell are expressed15

SNP (single nucleotide polymorphism)—DNA sequence variations that occur when a single base in the genome sequence is changed8

Whole genome sequencing—determination of the order of base sequences in an organism's complete set of DNA8

Classifying genetic tests

To identify tests that should be included in EHRs, an evaluation of validity, utility and intended purpose is required. As outlined by the Secretary's Advisory Committee on Genetic Testing, and summarized by Lose,1 “Three concepts are central to understanding how knowledge graduates to evidence sufficient to alter clinical practice. The first is analytical validity which is defined as the likelihood that the reported results are correct. The second, clinical validity is the degree to which the test correctly assesses risk of health or disease [eg, the test's sensitivity, specificity, incidence of false positive and false negative results, and positive and negative predictive value]. Finally, clinical utility is the degree to which the test guides medical management [including whether there is scientific evidence for interventions that are safe, effective and available to the individual being tested].”1 ,18

As genetic tests in research protocols are often still under investigation, are not required to follow clinical laboratory improvement amendments (CLIA) regulation, and might not have established clinical validity or utility, research results are often not reported to subjects or included in EHRs.10 ,1929 However, debate on this continues.19 ,21 ,26 ,29 A summary of current policies from cancer genomic studies and a recent guideline from the National Heart, Lung and Blood Institute can provide a framework to determine if results should be returned to study participants.19 ,26 These articles agree that data to be returned to patients should: be analytically valid; be clinically valid; and have established benefit for patients.19 ,26 Clinical validity should distinguish results that identify known diseases from conditions that are merely biological variations of uncertain clinical significance.30 If a research protocol anticipates the return of individual results, participants' informed and comprehending preferences should be documented and specify the type of information desired (expected results, incidental findings, or no results) and whether the information should be made available to personal physicians, relatives and/or included in EHRs.

Unlike research genetic tests, clinical genetic tests in the USA must be performed in CLIA-certified laboratories, offering assurance of analytical validity. Clinical genetic tests are obtained to diagnose, treat, or manage a patient's medical condition and traditionally have been used to diagnose monogenic inherited diseases with high penetrance (Mendelian inherited disorders).14 Traditional clinical genetic tests are ordered and interpreted by providers trained in genetics, and generally have high analytical validity, clinical validity, and utility.13 In contrast, personal genomic tests can be ordered by providers or lay persons (direct to consumer (DTC)) and are often not validated.13 ,31 ,32 Many personal genomic tests use genome-wide association studies to predict the likelihood of a person having or developing a particular disease or to predict a patient's response to a drug. These studies look at hundreds of thousands of human genomes and determine which variations (single nucleotide polymorphisms (SNPs)) are more commonly present in people with a particular condition.31

The public's growing interest in personal genomic testing coupled with limited government oversight (including limited validation and clinical guidelines for these tests) has encouraged the growth of DTC companies.18 ,3336 Despite improvements in the analytical validity of tests offered by these companies, concern regarding the clinical validity and utility of these tests remains high among scientific and medical communities.13 ,14 ,21 ,31 ,3743 Although DTC companies offer some monogenic tests that have shown clinical validity in specific populations, the validity may not be generalizable or applicable to different ethnic groups.14 In addition, most DTC personal genomic tests based on SNP associations with diseases or traits have not yet shown clinical validity or utility, especially in terms of outcome evaluations.13 ,14 ,37 ,44 Because of these limitations and concerns about inappropriate marketing, the Federal Trade Commission issued a consumer warning about at-home genetic tests and the US Food and Drug Administration and the Centers for Disease Control and Prevention (CDC) have advised that “at-home genetic tests are not a suitable substitute for a traditional healthcare evaluation.”45

Ethical issues surrounding genetic data

When considering incorporation of genetic data into EHRs, one must realize that as analytical validity, clinical validity, and utility decline, potential risks can outweigh potential benefits.37 ,46 Data that are inaccurate, unreliable and/or not useful in guiding care have a low likelihood of improving patients' lives but may still cause harm (figure 2). The potential risks of these tests are outlined in box 2.37 ,4654

Figure 2

Examples of harm. Case 1 involves personal harms of confusion and refusal of recommended care because of false reassurance from genomic test. Case 2 relates to personal harm of confusion in diagnosis. Case 3 involves personal harm of confusion prompting changes in reproductive plans and financial harm with increased use of medical consultation.

Box 2

Potential harms of genetic testing


Anxiety, depression, confusion, changes in life plans, changes in reproductive plans, parental guilt about passing on a deleterious mutation, survivor guilt about not having a deleterious genetic mutation when other family members do, refusal of recommended medical care because of false reassurance from an invalid genetic test


Stigmatization, breach of confidentiality, identification of misattributed parentage, privacy concerns or desire not to know a genetic result that may be at odds with family members' desire to know a genetic result, misuse of genetic data (surreptitious DNA testing or transfer of genetic data to third parties after sale of direct to consumer testing company)


Employment and training concerns (see legislative discussion below)


Increased use of medical consultation and follow-up tests, employment and insurance concerns (see legislative discussion below)


Discrimination in obtaining health, disability, life and long-term care insurance (see legislative discussion below)

Health information technology (HIT) professionals play a key role in protecting against these harms using effective data security and data governance. By ensuring high security (access management) in the use, storage and sharing of genetic data, the risk of confidentiality breach can be minimized. This can lessen other risks such as stigmatization, discrimination and family conflict. Central to this goal is the appropriate categorization and management of genetic data in relation to other data within EHRs. Proponents of ‘genetic exceptionalism’ recommend completely separate methods to protect, store and access genetic data due to the high potential for the harms noted below.55 ,56 Others argue that genetic data are not sufficiently different from other medical data to justify different security measures.46 ,5558 In discerning how to store and secure access to different components of clinical records, Green and Botkin58 recommend evaluating all data based on four characteristics: “(1) [the] degree to which information learned can be stigmatizing, (2) the effect of the test results on others, (3) the availability of effective interventions to alter the natural course predicted by the information, and (4) the complexity involved in interpreting test results.” We agree that tests that raise problems associated with these characteristics should be treated with extra caution involving extra security and high standards for consent, similar to how HIV and mental health data are managed. However, genetic data are unique in the permanence of biomolecular observations coupled with the impermanence of interpretations. HIT professionals will play key roles designing methods to allow: ready access and use of validated genetic data; thorough documentation of when, how and where these data were obtained; and storage of genetic data that will not yet be incorporated in EHRs.

Maximizing benefit and minimizing harm requires the provision of informed consent before genetic testing along with post-test counseling so individuals have adequate information to make their own decisions.59 ,60 Predictive genetic testing, especially non-targeted, multiple-gene testing or whole genome or exome sequencing, challenges providers' abilities to complete these tasks. Due to the large volume of information produced by these tests, lack of validation of many of the results, and scarcity of trained specialists to provide genetic counseling, patients may be inadequately prepared to make informed choices regarding testing or follow-up. In addition, many healthcare providers and lay persons possess limited understanding of genetics and statistics, which may result in unfounded worry, requests for unnecessary tests or interventions, and irrational reproductive decisions.6164 HIT professionals can aid progress towards the goal of informed consent through research and development of interactive genetic computerized clinical decision support (CDS) tools, the provision of easy access to genetic information databases, and the development of online genetic and statistical educational resources.6572 Developers can also help create efficient online consent forms for genetic testing, research participation and incorporation of genetic data into EHRs.

Key legislative mandates related to genetic testing, insurance and employment are outlined in box 3. Discrimination in access to insurance and employment opportunities related to genetic testing has been discussed in many articles.4750 ,52 These US laws were enacted to minimize the risk of employment and health insurance discrimination due to genetic findings. However, discrimination in eligibility for disability, life or long-term care insurance is still not formally addressed by federal statutes. Also, despite the passage of the Genetic Information Nondiscrimination Act in 2008, concern and controversy regarding unfair treatment due to a genetic predisposition still exist.76 HIT professionals can assist in this area by ensuring high security and Health Information Portability and Accountability Act (HIPAA) compliance in the design of patient care software, providing for easy updates should relevant laws change, and enabling effective, efficient and HIPAA-compliant research access to genetic data as authorized by research subjects.77

Box 3

Legislative acts relevant to genetic testing

Health Information Portability and Accountability Act of 1996 (HIPAA) 73

Sets standards on how protected health information should be controlled

Does not apply to many companies or laboratories that perform direct to consumer genetic testing and analysis

Protects against genetic discrimination in employer-sponsored group health plans

Genetic Information Nondiscrimination Act of 2008 (GINA) 74

Extends HIPAA protections by making it illegal to use genetic information to underwrite group and individual health insurance

Prohibits employers from making employment decisions based on genetic information

Does not address life insurance, disability insurance or long-term care insurance discrimination

Does not apply to health benefits for federal employees, members of the military, veterans seeking healthcare through the Department of Veterans Affairs, or the Indian Health Service

Does not apply to athletic programs

Patient Protection and Affordable Care Act of 2010 (ACA)75

Prohibits health insurers from determining eligibility for coverage based on signs and symptoms of genetic disease

Changes in 2014: prohibits differences in premiums according to health status and genetic information

Logistical issues surrounding genetic data incorporation into EHRs

A fundamental barrier to the ethical and efficient incorporation of genetic data into EHRs is the lack of central validation and oversight of genetic tests.3 ,40 ,42 ,78 The USA's current fragmented system can result in confusion and increased potential for harm due to incorrect interpretations, use of genetic tests of low validity and/or utility, or the under-utilization of tests with high validity and utility.7981 Box 4, adapted from Fabsitz et al,19 presents online sources of information about genetic tests and their validity and utility.

Box 4

Online sources of genetic test information

CDC Public Health Genomics 82

Provides information on multiple topics in genomics and links to specific topic sites (such as EGAPP and GAPP KB below). It also provides an alphabetized list of websites and topics related to genomics (genomic resources)


Evaluation of Genomic Applications in Practice and Prevention (EGAPP) 83

Established in 2004 by the CDC's Office of Public Health Genomics

Goal: “establish a systematic, evidence based process for assessing genetic tests and the application of genetic technology in transition from research to clinical and public health practice”


GeneTests 12

Sponsored by the University of Washington, Seattle and Bioreference Laboratories

Provides authoritative information from global genetic experts on genetic testing and its use in diagnosis, management and treatment of disease


Genetic Testing Registry (GTR) 84

Established in 2012 by the NIH

Goal: to provide “a central location for the voluntary submission of genetic test information by providers (including) the test's purpose, methodology, validity, evidence of the test's usefulness, and laboratory contacts and credentials (in order to) advance the public health and research into the genetic basis of health and disease”

Intended audience: providers and researchers

No NIH oversight (voluntary submission of information)

GTR to replace GeneTests Laboratory directory in 2013


Genomic Applications in Practice and Prevention Network (GAPPNet) 85

Established in 2009 by the CDC's Office of Public Health Genomics, the National Cancer Institute's Division of Cancer Control and Population Sciences and other stakeholders

Goal: “to accelerate and streamline effective and responsible use of validated and useful genomic knowledge and applications, such as genetic tests, technologies, and family history, into clinical and public health practice”


Genomics Applications in Practice and Prevention Knowledge Base (GAPP KB) 86

Supported by the CDC's Office of Public Health Genomics

Goal: to provide access to information on applications of genomic research to healthcare. This site provides links to four other sites that provide information on genomic applications (GAPPFinder), genomic test evidence (the evidence aggregator), genomic research studies archive (the project locator), and the online journal of genomic research (PLoS Currents: Evidence on Genomic Tests)


Human Genome Epidemiology Network (HuGENet) 87

Established by the CDC's Office of Public Health Genomics

A volunteer collaborative effort “to help translate genetic research findings into opportunities for preventive medicine and public health.”


Institute of Medicine of the National Academies: Roundtable on Translating Genomic-Based Research for Health 88

A non-profit, multidisciplinary collaborative effort of leaders from academia, industry and government to translate genomic research findings into health care, education and policy improvements


Online Mendelian Inheritance of Man—(OMIM) 89

A catalog of human genes and genetic disorders maintained by experts in genetics at Johns Hopkins University

Intended audience: providers and researchers


Pharmacogenomics Knowledgebase (PharmGKB) 90

An authoritative “pharmacogenomics knowledge resource that encompasses clinical information including dosing guidelines and drug labels, potentially clinically actionable gene-drug associations and genotype–phenotype relationships.”


US DHHS Secretary's Advisory Committee on Heritable Disorders in Newborns and Children—(SACHDNC) 91 ,92

A website containing information on the SACHDNC's recommendations to the secretary of the US Department of Health and Human Services on the “most appropriate application of universal newborn screening tests, technologies, policies, guidelines and standards.”


It is clear that a central validation body for genetic tests is long overdue.2 ,3 The evidence-based validation efforts of the Working Group of Evaluation of Genomics Applications in Practice and Prevention (EGAPP) are exemplary; however, their ability to keep pace with the rapid development of new genetic tests is evident in the small number of tests for which they have offered guidance (six recommendations and eight evidence reports) compared to the 2687 tests currently listed on the National Institutes of Health's (NIH) Genetic Test Registry.84 The rapid development of these tests is further documented by the GeneTest Laboratory directory growth chart, which lists just over 100 tests available in 1993 and over 2500 now.93 Before genomics can be used in daily clinical decision-making, more research is needed to establish systematic methods to ensure the analytical validity, clinical validity, and utility of these tests.44 These investigations must evaluate morbidity and mortality outcomes along with expected healthcare cost savings from these predictive tests. Primary research studies and methodical review of peer-reviewed evidence of the benefit and harms of genetic tests, similar to the US Preventive Services Task Force methods, should be pursued.94

Another logistical problem with incorporating genetic data into EHRs is the need for more clinicians trained in genetics.2 ,95 ,96 Recommendations to improve genetic education in medical schools and guidelines on genetic competency for primary care physicians have been proposed.61 ,97102 Studies of primary care physicians have shown a poor understanding of genetics and dissatisfaction with support tools at the point of care.1 ,101 ,103106 Computerized training tools used at providers' convenience and computerized CDS tools used at the point of care will be instrumental in facilitating clinicians' effective management of genetic issues.68 ,103 ,105 ,107110 Also, there is an urgent need to improve primary and secondary school math training in statistics, which is poorly understood by the majority of the population and is essential in understanding genetic testing.62 ,64 ,111 Here too, interactive computerized tools may facilitate faster, more accurate comprehension.112

Technological issues surrounding genetic data incorporation into EHRs

The US government is promoting the evaluation of genomics incorporation into clinical workflows.113 Currently, EHR systems handle genetic data like any other laboratory test. Those solutions are not optimal, requiring HIT developers to collaborate with scientists and health professionals to redesign EHR systems that allow efficient, secure incorporation of genetic/genomic data.114 Workflows to be considered include genetic/genomic knowledge access at the point of care, CDS tool development, order entry, test tracking, result interpretation, result retrieval, result re-interpretation and tracking, and patient notification. Green and Guyer69 explained, “Existing clinical informatics architectures are largely incapable of storing genome sequence data in a way that allows the information to be searched, annotated and shared across healthcare systems over an individual's lifespan.” HIT developers working on genetic data incorporation must overcome many of the same difficulties that challenged developers of digital imaging and communications in medicine and picture archiving systems.16 Radiological images and their interpretations are similar to genetic sequence data in that both require standardized methods to communicate unstructured and structured data, contain large volumes of data requiring compression, must interface with EHRs, contain identifiable patient information, and cannot be queried in their raw (unstructured) form. Masys et al115 have addressed many of these challenges in their paper describing seven essential requirements for the incorporation of genomic data into EHRs: separation of genetic sequence and genetic interpretation data, lossless compression of the sequence data, documentation of laboratory method details (when, where, how data were obtained), use of short codes to represent large sequence data to allow faster retrieval and analysis, design for human and machine-readable forms of the data, design for future changes to the data, and design for clinical and research use of the data.

To implement these features, HIT developers must agree on standardized genetic terminology and methods of data transfer. Standard structures and language are required for genetic data to be exchanged between clinics/EHR systems and to drive clinical decision support tools (figure 3). Health Level 7 (HL7), the widely used standard for EHR data transfer, has a clinical genomics work group working on areas such as the genetic testing report form and the family history model.116118 Currently, the main standards used for genetic tests include: the Logical Observation Identifiers Names and Codes (LOINC), the Human Genome Variation Society's (HGVS) nomenclature, the Human Gene Nomenclature Committee's (HGNC) terminology, Reference Sequences NCBI (RefSeq), the database of single nucleotide polymorphisms (dbSNP) and the International System for Human Cytogenetic Nomenclature (ISCN). The Genomic Variation Format also facilitates genetic data incorporation in EHRs through simplified formatting.119 Standards for associations of genetic mutations and disease states or drugs are LOINC, SNOMED and RxNorm, but many terms used to describe genetic diseases that are listed in the Online Mendelian Inheritance in Man database are not easily mapped to SNOMED or integrated into existing EHR systems.120 Further collaborative research between HIT developers and genetic research and clinical experts is needed to develop and test standardized models to ascertain the best representation of all genetic data.

Figure 3

Diagram showing integration of genetic data to the electronic health record (EHR) and a clinical laboratory improvement amendments (CLIA) certified laboratory for genetic tests. Also shown are examples of standards available for: data messaging (HL7), genetic data annotation (genomic variation format (GVF), Human Gene Nomenclature Committee's terminology (HGNC), Human Genome Variation Society's (HGVS)) and genetic data representation for clinical use (Logical Observation Identifiers Names and Codes (LOINC), SNOMED, RxNorm, GVF).

Once genetic data standardization is agreed upon, multiple technological issues must still be addressed. As noted above, the unique nature of genetic data in its simultaneous permanence and impermanence mandates that the biomolecular (permanent) findings be available in machine and human-readable forms that allow re-interpretation as new understanding of these data develops.121 In deciding how to incorporate genetic data into EHRs, a distinction between validated test results and unvalidated test results must be made. More research is needed to determine how best to ensure that test results that are not yet clinically valid or useful do not influence patient care inappropriately. The GeneInsight Suite platform uses a reporting system within the EHR that designates a variant as benign, likely benign, unknown significance, likely pathogenic or pathogenic.122 This allows providers to see all variants, however, this amount of data may be difficult for providers and EHR systems to manage. Also, research needs to evaluate whether a variant labeled as benign or of unknown significance may result in the ‘vulnerable child syndrome’ or a population of ‘worried well’ due to persistent concern about such ‘abnormal’ results.123125

One possible mechanism to address these concerns is to use a separate warehouse (a virtual shadow chart) to store all test results and interpretations that are not yet validated or useful, only uploading them into the EHR when they are determined to meet high standards for clinical validity and utility. Ideally this determination would be made by a central body overseen by a local specialist who would approve transfer into the EHR and confirm that appropriate patient consent and counseling occurred. This system would mimic the current research protocol of the Pharmacogenomics of Anticancer Agents Research in children (PAAR4Kids).126 Problems with such a system include the need for a separate, secure warehouse, the need for robust warehouse/EHR interoperability, the current shortage of genetic specialists, and the current lack of a central validating body to determine a test's readiness for EHR integration. Also, legal or ethical risks of excluding unvalidated information would need to be determined through further scholarly work.127

Given the large volume of data that comprises the human genome and its interpretation, genomic data storage requirements must be decreased and search and retrieval capabilities improved. As discussed by Masys et al,115 general compression formats are inadequate for genetic data due to the need for lossless reduction of the data and the extremely large datasets that are involved. They proposed lossless compression of genetic data by “representing personal nucleotide and/or protein sequences as the difference between the individual and what [they] propose calling a ‘Clinical Standard Reference Genome’” resulting in a 100-fold data size reduction.115 Drawbacks of this method include the current lack of a universal standard reference genome, the large storage space needed for the reference genome, and the significant time needed for decompression of the data.128 Qiao et al128 proposed a compression algorithm called “SpeedGene” that chooses from among three algorithms based on disk space and claims to overcome problems of large files and long loading times of genetic data. Vey129 proposed the “Differential Direct Coding algorithm” that is able to distinguish expected sequence data containing the usual nucleotide bases from other data types and codes these data differently. This algorithm accurately codes (rather than removes and stores) genetic data that may contain wildcards (other nucleotide bases), annotation data or special repeats. This improves time to compress and decompress the data and eliminates the need for separate storage of wildcard data.129 These methods appear promising, but further research is needed to determine the best method of genetic data compression.

In addition to data compression, innovations are needed in data storage and computation capability. One solution to both of these problems could be cloud-based computing in which customers pay only for the amount of storage and computation time used.130132 The flexibility of a system that expands or shrinks as storage and computation needs change may provide a good economic option.133 ,134 Disadvantages of cloud computing include data transfer time to and from the cloud and legal/ethical issues regarding patient data security and privacy.130 ,135 ,136 Heterogeneous computational environments offer another option to increase computational speed through the use of computers containing specialized accelerators (graphical processing units) that improve arithmetic processing by 10 to 100-fold.130 Although this technology will not alleviate genetic data storage concerns, it could be coupled with cloud computing to maximize both computational and storage capability. A difficulty with this faster technology is that it requires specialized programming languages and informatics expertise to develop or modify applications.130 Another solution to problems with storage and processing of genetic big data may be found in the open source software project Hadoop and its programming model, MapReduce, which partitions and distributes structured and unstructured data and their analysis in parallel, over multiple computers. This can be accomplished in the cloud or using a hybrid cloud-cluster architecture, allowing big data to be stored, queried and processed quickly and cost effectively.137 ,138 Hadoop may be well suited to the volume and heterogeneity of genetic data, but security concerns over data storage location (cloud vs clusters of computers) and associated software usage will need to be addressed to ensure HIPAA compliance. A solution to security and privacy problems in genome interpretation has been proposed by Knome's stand-alone system with enough storage and power to process one genome per day.139

Data security and privacy concerns are not unique to genomic data but critical to address given the ethical issues discussed above, the pressing need for data sharing, and the vanishing concept of data de-identification as it relates to genomic data.20 ,57 ,140146 The US Presidential Commission for the Study of Bioethical Issues recently released recommendations for ensuring patient privacy and data security while promoting scientific advancement through the use of whole genome sequencing. The five recommendations discuss the need for: “strong baseline protections while promoting data access and sharing; data security and access to databases; consent; facilitating progress in whole genome sequencing; [and] public benefit.”147 These recommendations discuss many of the same issues addressed by the Working Conference of the American Medical Association in 2007, which proposed a model of data stewardship for health data collection, storage, use and disclosure. The resultant paper by Bloomrosen and Detmer148 defined principles of data stewardship to include: accountability, transparency, need for consent, appropriate notice to patients regarding use of their data (consistent with permitted uses/disclosures), attention to technical issues involving security and de-identification of data, and enforcement of these principles. Models that allow both clinical and research use of data have been proposed including caBIG, I2B2, STRIDE and the Information Warehouse.149152 An excellent evaluation of different integrated data storage and use structures is provided by Louie et al.153

Conclusion and future steps

Ethical issues surrounding the incorporation of genetic data into EHRs can be summarized as a need to maximize good and minimize harm while respecting people as autonomous beings with a right to make their own decisions. These principles have been applied to genetic data integration to identify the following key challenges HIT professionals face.

  1. Standardization of genetic content (raw data, ontology) to allow accurate and efficient interpretation of genetic data from EHRs, facilitate development of computerized CDS tools, and secure transmission of data between EHRs. As stabilization of such standards can take decades, warehousing of raw genetic data may be a good practice for the time being.

  2. Development of information infrastructure to facilitate genetic test validation and computerized CDS tool creation to assist providers with interpretation, follow-up and re-interpretation of these data.

  3. Development of efficient storage of genetic data using compression and archiving methods uniquely suited to the size, structure and security requirements of these data.

  4. Development of workflow models that allow secure clinical, research and auditable use of genetic data.

Other areas of need include computerized training programs in genetics and statistics, better patient consent tools and enhanced educational focus on statistics. This paper is offered as a review of current challenges and avenues for continuing exploration as we journey towards the ethical and efficient incorporation of genetic data in EHRs and, ultimately, the delivery of personalized medicine.


KS, EAM and UT: Conception and design of manuscript. KS: Bibliography search and summary of articles. EAM, NF and UT: Assistance with bibliography search and summary of articles. KS, NF and EAM: Analysis of articles. KS, NF, UT and EAM: Writing of the manuscript. All authors approved the final version of the manuscript.


This work was supported by the Clinical and Translational Science Award (CTSA) program, through the NIH National Center for Advancing Translational Sciences (NCATS), grant UL1TR000427, and NLM Grant 5T15LM007359 to the Computation and Informatics in Biology and Medicine Training Program. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Competing interests


Provenance and peer review

Not commissioned; externally peer reviewed.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com


View Abstract