Abstracts

Keynote Speakers

Dr Sergey Koren

Staff Scientist, National Human Genome Research Institute

GENOME ASSEMBLY FOR THE LONG-READ ERA

A complete and accurate genome sequence forms the basis of all downstream genomic analyses. However, even the human reference genome remains incomplete, which affects the quality of experiments and can mask true genomic variations. For most other species, high-quality reference genomes do not exist. Typically, reference projects focus on a single, typically inbred, individual to minimize heterozygosity and simplify assembly. This approach mixes haplotypes, hides variation, and introduces false duplication which causes errors in downstream analysis. "Trio binning" is designed specifically for heterozygous genomes resulting in a complete diploid reconstruction. On a benchmark human trio, this method achieved high accuracy and recovered complex structural variants missed by alternative approaches, including the highly polymorphic MHC region. We applied this method to several human trios, fish, bird, and mammal genomes, including an F1 cross between two Bovinae species. As a result, we completely assembled both parental haplotypes with NG50 haplotig sizes >65 Mbp and 99.998% accuracy, surpassing all current mammalian assemblies. These haplotype-resolved assemblies enable precise surveys of structural variation to create more representative references, providing opportunities to study complex variation.

Professor Ewan Birney

European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom.

BIG DATA IN BIOLOGY AND MEDICINE

Molecular biology is now a leading example of a data intensive science, with both pragmatic and theoretical challenges being raised by data volumes and dimensionality of the data. These changes are present in both “large scale” consortia science and small scale science, and across now a broad range of applications – from human health, through to agriculture and ecosystems. All of molecular life science is feeling this effect.

As molecular techniques – from genomics through transcriptomics and metabolomics – drop in price and turn around time there is a wealth of opportunity for clinical research and in some cases, active changes clinical practice even at this early stage. The development of this work requires inter-disciplinary teams spanning basic research, bioinformatics and clinical expertise.

This shift in modality is creating a wealth of new opportunities and has some accompanying challenges. In particular there is a continued need for a robust information infrastructure for molecular biology and clinical research. This ranges from the physical aspects of dealing with data volume through to the more statistically challenging aspects of interpreting it. A particular problem is finding causal relationships in the high level of correlative data. Genetic data are particular useful in resolving these issues.

Dr Kathie Grant

Public Health England

EXPLOITING GENOMICS FOR INVESTIGATING GASTROINTESTINAL INFECTIOUS DISEASE

There are approximately 17 million cases of gastrointestinal infectious illness each year in England and Wales and 5000 deaths. This comes with a significant cost to individuals who are ill, the health service and employers through lost days of work. Public Health England works together with a range of partner organisations to detect, identify and reduce such infections.

The successful identification and investigation of gastrointestinal infectious illness depends upon microbiological and epidemiological tools and trace back studies being used in concert to identify the pathogen, define and quantify the number of cases, detect the source of infection and determine the route of transmission. This enables effective control measures to be implemented and action to be taken to stop further cases and prevent outbreaks in the future. Tracking infections to their original source is often complicated particularly when transmission involves the food chain and may involve multiple processing and distribution steps and more than one country.

Whole genome sequencing (WGS) offers unprecedented levels of sensitivity and specificity for determining the genetic relatedness of bacterial strains and has proven to be a transformational tool for investigating gastrointestinal infectious illness. The application of WGS to gastrointestinal bacterial pathogens is able to provide strong microbiological evidence linking cases of illness and ruling out cases that might be caused by the same pathogen but not the same strain. This is refining case definitions in outbreaks and increasing the power of epidemiological investigations. WGS is identifying clusters and outbreaks of disease previously unidentified by conventional typing and surveillance tools and is providing stronger links between isolates from human illness and those from food, animal and environmental samples. In addition, WGS information on the evolutionary relationship of strains is providing enhanced source attribution and evidence for the initial point of contamination as well as geographical signals as to where the strain may have originated from. The real time use of WGS is having a ground breaking impact on the ability to monitor and investigate gastrointestinal infectious disease facilitating better public health controls and preventative action to be implemented and enabling effectiveness to be evaluated more accurately .

Kevin Davies

Author, The $1,000 Genome and Cracking the Genome;

Founding Editor, Nature Genetics;

Executive Editor, The CRISPR Journal

OBSERVATIONS ON THE ROAD TO THE $1000 GENOME.

Since his first book Breakthrough (co-authored with a former Thompson Twin) which covered the race to isolate the breast cancer gene, Kevin Davies, the founding editor of Nature Genetics, has enjoyed publishing and reporting on advances in genome research. His subsequent books, Cracking the Genome and The $1,000 Genome, spanned the Human Genome Project and advances in next-gen sequencing and consumer genetics. The latter led to a personal invitation to collaborate with Jim Watson on an updated version of DNA: The Story of the Genetic Revolution. In this talk, Kevin shares stories and highlights from the genome revolution, including his current fascination with CRISPR and gene editing.

Speakers

Professor Mark Akeson

University of California, Santa Cruz

THERE AND BACK AGAIN: SEQUENCING RNA WITH NANOPORES

Nanopore polynucleotide sequencing was conceived in the late 1980s, and implemented as a practical laboratory tool in 2014. Although DNA sequencing was the main focus during that interval, capture and translocation of RNA strands provided the earliest evidence that nanopore sequencing had promise. In 2018, nanopore sequencing of native RNA strands re-emerged as a unique tool for describing transcriptomes. In my talk, I will briefly revisit the original RNA nanopore experiments published in the 1990s, then follow-up by discussing a recent multi-center study that generated 9.9 million aligned native poly(A) RNA reads for a model cell line using Oxford Nanopore platforms. These native RNA reads included multi-exon transcripts up to 22 kilobases in length, and ionic current signatures that reveal base modifications.

Dr Niall Gormley

Principal Scientist, Illumina

SAMPLE PREPARATION FOR NGS ON A SURFACE

NGS platforms require biological samples to be transformed into a library format prior to sequencing. This usually entails several enzymatic, sometimes mechanical, steps performed on the bench. Traditionally, these reactions have been performed in solution phase in tubes, in contrast to the sequencing reactions or clustering reactions that are performed on a surface of a flow cell. Illumina recently introduced a novel sample preparation workflow where libraries are also generated on a surface: on paramagnetic beads. In this talk, the characteristics and potential of generating sequencing on a surface will be discussed.

Dr Lia Chappell

Post Doctoral Fellow, Wellcome Sanger Institute

METHODS FOR SEQUENCING THOUSANDS OF CELLS IN PARALLEL

In this talk I’ll present an overview of the range of methods currently available for sequencing thousands of cells in parallel, contrasting the benefits and disadvantages of “open-source” methods (such as Drop-seq, InDrop and Seq-Well) with commercially available platforms (such as 10x). I’ll touch on some highlights of our efforts to further extend and improve the Seq-Well platform (developed at MIT); this flexible, portable and instrument-free methods requires almost no specialised equipment in the lab where the single cells are processed into libraries.

Barry Merriman Ph.D

Chief Science Officer & Co-Founder, Roswell Biotechnologies, Inc.

THE FINAL DISRUPTION: MOLECULAR ELECTRONICS FOR DNA

The $1000 genome—long considered the inflection point for genome sequencing to gain clinical adoption—is in fact far too costly for population scale genome sequencing projects. In order for whole genome sequencing to assume its proper role as a cornerstone of precision medicine, a $100 genome is essential, and a roadmap to far lower costs is further essential for global adoption. This requires a major technology disruption. In this talk, we introduce the molecular electronics solution, which provides a near-term path to the $100, 1-hour genome for Precision Medicine, combined with a long term roadmap to far lower costs and far greater speeds. In this approach, single molecules are integrated as sensor elements into CMOS integrated circuits, to produce CMOS sensor pixel chips that offer the maximum in scalability, manufacturable nanotechnology. The molecular electronics sensor chips under development at Roswell represent the realization of a 50-year old vision of integrating singe molecules into circuits on chip. We also comment on how this approach enables the new sector of DNA digital data storage, by enabling a DNA data reader platform that can read at the Exabyte scale, and exceed the economics of reading from existing archival storage media, such as magnetic tape.

Cameron Frayling

CEO, Base4

AN INTRODUCTION TO THE BASE4 SEQUENCING PLATFORM.

Base4 began development of a new sequencing platform in 2014, focussed on single-molecules, long reads, and the ability to read modifications such as DNA methylation directly, without bisulphite conversion. We have successfully developed a reliable and extremely capable chemistry which is able to detect and characterise individual nucleotides, including their methylation states. We are now developing a prototype platform which enables simple manipulation of individual molecules of DNA and for each of those molecules to be sequenced. The technology is still in development but shows tremendous promise in delivering extremely high accuracy single molecule DNA sequences with simultaneous readout of multiple different methylation states.

Deyra Rodriguez

Product development scientist, New England Biolabs

NOVEL APPROACHES TO ADDRESS CHALLENGES IN SAMPLE PREPARATION

RNA sequencing (RNA-seq) has become the tool of choice for transcriptome profiling and discovery. As RNA-seq is increasingly adopted, the demand for RNA library construction methods that produce high quality, reproducible libraries from small amounts of precious material is rising. To meet this need, we have developed two streamlined RNA-seq library preparation methods that can be used across a wide range of input RNA, from single cells to a microgram of total RNA. Sequencing data from these methods show that important parameters such as GC content, gene body coverage and gene expression correlation remain consistent across input amounts. As a result, our methods have increased sensitivity and specificity for low-abundance transcripts, and reduced PCR duplicates and sequence bias, delivering high quality data.

Andy Higgs

UK Operations Manager, Advanced Analytical

AUTOMATING QC OF LARGE DNA WITH THE FEMTO PULSE

In creating the FEMTO Pulse, we made significant and innovative changes to allow it to separate nucleic acid smears through 200,000 bp and to achieve femtogram level sensitivity. Currently it has more than 10 times the sensitivity for smears and 100 times more sensitivity for DNA fragments than other instruments. In fact the FEMTO Pulse can detect DNA fragments down to a concentration of 5 femtogram/microliter, in the well.

Professor Federica Di Palma

Director of Science at the Earlham Institute (EI) and director of the BRIDGE Colombia network of researchers across the UK and Colombia

EVOLUTION OF GENE REGULATORY NETWORKS CONTROLLING TRAITS UNDER NATURAL SELECTION IN CICHLIDS

Gene regulatory network evolution is a key driver of anatomical innovations, serving as a substrate for the evolution of phenotypic diversity and adaptation. However, little is known about the genome-wide evolution of regulatory networks (genotype) and their potential phenotypic effect across ecologically-diverse species (ecotype). In vertebrates, the phenotypic and ecotypic diversification of East African cichlids is unparalleled, implying the rapid evolution of regulatory regions and networks underlying the traits under selection during the early stages of speciation. To investigate tissue-specific evolution of gene regulatory networks along a phylogeny, we developed a framework to identify ancestral reconstructed and extant species co-expression modules and their associated regulators (cis-regulatory elements, transcription factors and miRNAs) and applied this framework to six tissues in five East African cichlids. Along the phylogeny, our analyses identified modules with tissue-specific patterns across the five cichlids species that were predicted to be regulated by diverged suites of regulators. We report striking cases of rapid network rewiring for genes known to be involved in traits under natural and/or sexual selection, such as the visual system. In regulatory regions of visual opsin genes e.g. sws1, polymorphisms in transcription factor binding sites (TFBSs) have driven network rewiring, consistent with ecological niches of different lake species. Within same lake species but between ecologically diverse groups, segregating TFBSs suggests ecotype-associated network rewiring in East African cichlid radiations. Our unique integrative approach to infer regulatory networks across multiple species allowed us to identify the rapid regulatory changes associated with traits under selection in radiating cichlids.

Daniel Mead

Project manager of The 25 Genomes Project, Wellcome Sanger Institute.

THE 25 GENOMES PROJECT, SEQUENCING A BROAD TAXONOMIC ASSORTMENT OF UK SPECIES.

The Sanger Institute turns 25 this year and as a part of the celebrations we’re producing high quality reference sequences for 25 UK species. From the selection of the species, through sample collection, DNA extraction, sequencing and assembly many interesting challenges have arisen and some of them have even been overcome. These genomes are being sequenced using PacBio, 10X chromium and (in some cases) Bionano and Hi-C, with the aim of producing assemblies with contig N50 >1Mb and scaffold N50 >10Mb. This talk will also demonstrate, amongst other things, that a one-size fits all pipeline is not the best way to go and why picking the blackberry as a species to sequence demonstrates the hidden complexity of taxonomy and how naïve expectations can make a project more complex.

Ramiro Alberio

Authors: Priscila Ramos-Ibeas¹, Fei Sang², Sarah Withey¹, Walfred Tang^3,4, Qifan Zhu¹, Doris Klisch¹, Matt Loose², Azim Surani^3,4and Ramiro Alberio¹

¹School of Biosciences, University of Nottingham, Sutton Bonington Campus, LE12 5RD, UK, ²School of Life Sciences, University of Nottingham, Nottingham, NG7 2RD, UK, ³Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK, ⁴Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK

LINEAGE SEGREGATION, X CHROMOSOME DYNAMICS AND REGULATION OF PLURIPOTENCY DURING PIG EMBRYOGENESIS REVEALED BY SINGLE CELL RNA-SEQ

Pre-implantation embryo development follows regulative processes of lineage segregation and transcriptional regulation that culminate with the formation of the epiblast, the cells of which give rise to all the fetal lineages. How these processes are regulated in large mammals is poorly understood. Here, we present the transcriptional map of pig embryo development following single cell RNA Seq. We show gradual segregation of the inner cell mass and trophectoderm in early blastocysts, followed by epiblast and hypoblast segregation in late blastocysts. In females, dosage compensation and X chromosome inactivation is accomplished in the late epiblast before lineage priming. We reveal the transcriptional circuitry and signaling effectors of pluripotency that define a short naïve pluripotent phase followed by a protracted primed stage. This detailed transcriptional analysis provides a blueprint for understanding early embryogenesis in the pig embryo that will impact the development of stem cell technologies in domestic animals.

Seyhan Yazar

Authors: Seyhan Yazar^1,2, Tamieka A Fraser^3,4, Alison Meynert¹, Adnan Moussalli⁵, Jeremy J Austin⁶, Janine Deakin⁷, Alynn Martin³, Sandy S Hung⁸, David A Mackey², Oz Mammals Genomics Consortium, Anna J MacDonald⁹, Adam Polkinghorne⁴, Matthew A Brown¹⁰, Martin Taylor¹, Colin Semple¹, Scott Carver ^3*, Alex W Hewitt ^8,11*

¹ Medical Research Council (MRC) Human Genetics Unit, Institute of Genetic and Molecular Medicine, University ofEdinburgh, Edinburgh, United Kingdom, ² Centre for Ophthalmology and Visual Science, University of Western Australia, Perth, Western Australia, Australia, ³ Department of Biological Sciences, University of Tasmania, Hobart, Tasmania, Australia, ⁴ Centre for Animal Health Innovation, Faculty of Science, Health, Education and Engineering, University of the SunshineCoast, Sippy Downs, Queensland, Australia, ⁵ Sciences Department, Museums Victoria, Carlton Gardens, Victoria, Australia, ⁶ Australian Centre for Ancient DNA, School of Biological Sciences, University of Adelaide, Adelaide, South Australia,Australia, ⁷ Institute for Applied Ecology, University of Canberra, Bruce, Australian Capital Territory, Australia, ⁸ Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, Victoria, Australia, ⁹ Australian National University, Canberra, Australian Capital Territory, Australia, ¹⁰ Institute for Health and Biomedical Innovation, Translational Research Institute, Queensland University of Technology,Brisbane, Australia, ¹¹ School of Medicine, Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia.

* These authors contributed equally to this work.

DE NOVO GENOME AND TRANSCRIPTOME ASSEMBLIES OF THE BARE-NOSED WOMBAT

Wombats are among Australia's most iconic marsupials. They represent the world’s largest burrowing herbivores and are threatened by a range of processes, including disease, collision with motor vehicles and conflict with land holders. As part of the Oz Mammals Genomics Consortium, we have been working on de novo genome and transcriptome sequencing of the bare-nosed wombat (Vombatus ursinus). Our datasets include deep Illumina HiSeq 4000 paired-end sequencing data (100x coverage), 10X Genomics microfluidics-based linked reads (56x), low-coverage Pacbio single-molecule real time (SMRT) sequencing data (5x) and RNA-seq data generated from six different tissues using Illumina HiSeq 4000 paired-end technology from a single animal. Additional genomes are currently being sequenced to study differences among wombat species. The draft genome assembly has a N50 scaffold size of 29.4 Mbp with an estimated assembly size of 3.6 Gbp. Four transcriptome assemblies generated using a total of 862 million reads pooled from five of the six tissues with four different assemblers. A consensus set of 135, 741 unigenes was constructed using a published pipeline of CD-Hit-EST and annotated with Tritonate to present as the final representative transcriptome assembly. We used the Benchmarking Universal Single-Copy Orthologs (BUSCO) library of mammalian orthologous genes for quality assessment and recovered 95% of 4101 single-copy mammalian orthologs in the transcriptome assembly. This presentation will report the updated state of the genome and transcriptome assemblies and discuss assembly approaches applied using multiple platforms.

Dr. Pedro H. Oliveira

Department of Genetics and Genomic Sciences, Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine, New York, New York, United States of America.

THE CHROMOSOMAL ORGANIZATION OF HORIZONTAL GENE TRANSFER IN BACTERIA

Bacterial adaptation is accelerated by the acquisition of novel traits through horizontal gene transfer, but the integration of these genes affects genome organization. We analyzed 932 complete genomes of 80 bacterial species, and found that transferred genes are concentrated in only ~1% of the chromosomal regions (hotspots)¹. This concentration increases with genome size and with the rate of transfer. Hotspots diversify by rapid gene turnover; their chromosomal distribution depends on local contexts (neighboring core genes), and content in mobile genetic elements. Hotspots concentrate most changes in gene repertoires, reduce the trade-off between genome diversification and organization, and should be treasure troves of strain-specific adaptive genes. Most mobile genetic elements and antibiotic resistance genes are in hotspots, but many hotspots lack recognizable mobile genetic elements and exhibit frequent homologous recombination at flanking core genes. Overrepresentation of hotspots with fewer mobile genetic elements in naturally transformable bacteria suggests that homologous recombination and horizontal gene transfer are tightly linked in genome evolution. Knowing the organizational traits of chromosomes might facilitate large-scale genetic engineering and should lead to a better understanding of the evolutionary interactions between horizontal gene transfer and genome organization.

¹ Oliveira, PH; Touchon, M; Cury, J; Rocha, EPC. (2017). The chromosomal organization of horizontal gene transfer in bacteria. Nature Communications. 8, 841.

Dr Gemma Langridge

Medical Microbiology Research Laboratory, Norwich Medical School, University of East Anglia, Norwich, NR4 7UQ, UK

METABOLIC SIGNATURES OF HOST ADAPTAION IN SALMONELLA ENTERICA

Serotyping separates isolates of Salmonella enterica into more than 1,500 serovars. Many serovars contain isolates which have biological coherence (e.g. S. Typhi all cause enteric fever in humans), but this is not the case for all serovars. The focus here is upon isolates with the antigenic formula (O: H phase I: H phase II) 6,7:c:1,5: S. Choleraesuis, S. Paratyphi C and S. Typhisuis. This group contains strains adapted to different animal hosts and are currently typed by biochemical tests. These three closely related but very different serovars therefore represented an opportunity to investigate host adaptation within the S. enterica species. Whole genome sequencing was used to analyse a collection of Salmonella which share an antigenic formula; 6,7:c:1,5, but differ in host adaptation: S. Paratyphi C (human), S. Choleraesuis (both humans and swine) and S. Typhisuis (swine). Genes were identified which can be used for differentiating them in a diagnostic laboratory and their metabolic ability in the context of their host adaptation was compared.

Professor James McInerney

University of Nottingham

WHY PROKARYOTES HAVE PANGENOMES.

In this talk I will outline the evidence that prokaryotic pangenomes are – on average – advantageous. I will also outline the reasons why advantageous gene acquisition and losses do not result in selective sweeps that remove variation in the population. I will end the talk by outlining some of the problems with everything I said.

Dr. William Rowe

Authors: Will P. M. Rowe¹* , Anna P. Carrieri² , Edward O. Pyzer-Knapp² , Lindsay J. Hall³ , Martyn D. Winn¹

¹Scientific Computing Department, STFC Daresbury Laboratory, UK, ²IBM Research, The Hartree Centre, UK, ³Quadram Institute Bioscience, Norwich Research Park, Norwich, UK

GROOT AND HULK: SKETCHING MICROBIOMES FOR RESISTOME PROFILING AND DETERMINING ANTIBIOTIC DYSBIOSIS.

Motivation: Antimicrobial resistance (AMR) remains a major threat to global health. Profiling the collective AMR genes within a microbiome (the ‘resistome’) and determining antibiotic dysbiosis facilitates greater understanding of AMR gene diversity and dynamics; allowing for gene surveillance, individualized treatment of bacterial infections and more sustainable use of antimicrobials. However, these analyses can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed efficient and accurate methods for resistome profiling and determining antibiotic dysbiosis that address these complications and improve upon currently available tools.

Results: We present GROOT and HULK, two methods that utilise data sketching for rapid microbiome comparisons and similarity-search queries that can be performed in real-time on sequence data streams.

GROOT: GROOT combines variation graph representation of gene sets with a locality-sensitive hashing forest indexing scheme to allow for fast classification and alignment of metagenomic sequence reads to known AMR gene variants. On a set of clinical preterm infant microbiome samples, we show that GROOT can generate a resistome profile in 2 minutes using a single CPU (per sample), is more accurate than existing tools and can identify acquisition of AMR gene variants over time (e.g. gain of extended spectrum beta lactamase activity).

HULK: HULK employs streaming histogram sketching of k-mer spectra to obtain a sample signature that is suitable for similarity testing and machine learning classifiers. We show that HULK can sketch these same preterm infant microbiome samples in an equivalent time and can differentiate between antibiotic and non-antibiotic treated samples, enabling blinded clinical samples to be classified by antibiotic treatment history using a gaussian process classifier (accuracy 0.953, F1-score 0.967, Precision 0.972).

Availability and implementation: GROOT and HULK are written in Go and available at https://github.com/will-rowe/groot and https://github.com/will-rowe/hulk (MIT licenses).

Ralph Vogelsang, PhD

EMEA Sales Development Manager, PacBio

SEE BACTERIAL GENOMES IN HIGH RESOLUTION WITH SMRT SEQUENCING

Single Molecule, Real-Time (SMRT®) Sequencing delivers long continuous reads (>20 kb), high consensus accuracy (up to 99.999% QV50), uniform coverage (even across high GC content regions), along with simultaneous epigenetics characterization.

Obtaining microbial genomes with high accuracy and contiguity has become faster and more affordable thanks to new multiplexing barcoding kit, pooling tool, and streamlined analysis workflow. The increased throughput of the PacBio® Sequel® System enables multiple microbes to be sequenced on a single SMRT Cell, greatly increasing throughput and reducing costs per genome assembly.

Ralph will review workflows and share example data to illustrate how the PacBio technology enables scientists to generate high-quality reference genomes, reconstruct intact genes and gene clusters, clarify the role of mobile elements in drug resistance and transmission, and assess the contribution of DNA modification to pathogenesis.

Professor Cathryn Lewis

Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience; Department of Medical and Molecular Genetics, Faculty of Life Science and Medicine; King’s College London

APPLYING POLYGENIC RISK SCORES TO PSYCHIATRIC DISORDERS - HYPE AND HOPE

Genome-wide association studies have finally begun to identify the genetic component to psychiatric disorders, with >150 variants associated with schizophrenia and 44 variants with major depression. Information on the genetic contribution to disorder risk is also captured by non-significant SNPs, which are included in calculating polygenic risk scores. Constructed from SNP effect sizes in a discovery genome-wide association study, polygenic risk scores give an individual-level measure of genetic liability to disease.

Polygenic risk scores are well-powered to show statistically significant discrimination between cases and controls, but explain only a small proportion of liability to disease: 7% for schizophrenia and 2% for depression. Despite this, there are hopes that polygenic risk scores may have a role in clinical care. Scores could be used (1) to identify those at increased risk of disease, for whom intervention or monitoring may be appropriate, (2) to inform treatment options, potentially discriminating between the many anti-depressants and anti-psychotics available, or between pharmaceutical and psychological therapy, or (3) to determine likely prognosis, for example schizophrenia scores can be used to predict which first episode psychosis cases are likely to develop schizophrenia.

This session will assess the strengths and weakness of polygenic risk scores in psychiatric disorders, delineating the challenges to be overcome in moving them from research setting towards the clinical arena.

Professor D. Gareth R Evans

Division of Evolution and Genomic Sciences, Faculty of Biology, Medicine and Health, University of Manchester, MAHSC, Manchester, UK and Prevention Breast Cancer Unit and Nightingale Breast Screening Centre, Manchester University NHS Foundation Trust (South), Manchester, UK.

BREAST CANCER PATHOLOGY AND STAGE ARE BETTER PREDICTED BY RISK STRATIFICATION MODELS THAT INCLUDE MAMMOGRAPHIC DENSITY AND COMMON GENETIC VARIANTS

Background: There are increasing efforts to stratify breast cancer risk to enable more targeted early detection and prevention strategies that will better balance the risks and benefits of population screening programmes.

Methods: Data from a subset of 9362 of the 57,902 women in the Predicting Risk Of Cancer At Screening (PROCAS) study were examined. These women were unaffected by breast cancer at study entry and provided DNA for a polygenic risk score (PRS). The PRS was analysed along with mammographic density (density residual-DR) and standard risk factors to assess future risk of breast cancer based on tumour stage receptor expression and pathology (invasive/DCIS).

Results: For the 195 prospective incident breast cancers a predictor based on Tyrer-Cuzick, DR and PRS was informative for subsequent breast cancer overall (IQ-OR=2.25 (1.89-2.68)) with excellent calibration (0.99). The model performed particularly well in predicting higher stage (stage 2+ IQ-OR=2.69 (2.02–3.60) and ER+ BCs (ER+ IQ-OR =2.36 (1.93–2.89)). Individually DR was most predictive for HER2+ and stage 2+ cancers but did not discriminate as well between poor and extremely good prognosis BC as either Tyrer-Cuzick or the PRS. In contrast the PRS gave the highest OR for incident stage 2+ cancers, (IQR-OR=1.79 (95% CI 1.30-2.46)). None of the three prediction measures, individually or in combination, were good predictors of ER negative breast cancer.

Conclusions: A combined approach using Tyrer-Cuzick, DR and PRS provides accurate risk stratification, particularly for poor prognosis cancers. This provides support for reducing the screening interval in high-risk women and increasing the screening interval in low-risk women defined by this model.

Dr Steven Pullan

Public Health England, National Infection Service, Porton Down

NANOPORE SEQUENCING FOR VIRAL CLINICAL SAMPLE INVESTIGATION; IN-FIELD METAGENOMICS FROM THE LARGEST EVER RECOREDED LASSA FEVER OUTBREAK.

Emerging and re-emerging RNA viruses cause a significant global disease burden, ranging from mild febrile illness to haemorrhagic fevers. Rapid and unbiased identification methods, such as metagenomic MinION sequencing, are vital for the identification and characterisation of pathogens for which little prior knowledge is available. Portable methodologies for field use are required during such outbreaks, especially when they occur in resource-limited settings. We have demonstrated that metagenomic MinION sequencing can elucidate full viral genomes directly from clinical samples across a clinically relevant range of viral titres. Only a few months ago, we performed the first ever metagenomic nanopore sequencing direct from patient samples in an outbreak epicentre, during the largest Lassa fever outbreak ever recorded in Nigeria. The data generated on-site supported the outbreak response through rapid communication to national authorities, including the Nigerian Centre for Disease Control (NCDC), and the WHO on the genomic epidemiology of Lassa virus lineages present during the outbreak.

Dr. Greg Elgar

Genomics England

THE END OF THE BEGINNING - THE GENOMICS ENGLAND 100,000 GENOMES PROJECT

Genomics England was established in 2013 with the aim of integrating genomics into medicine and personalised healthcare. Since then a robust pipeline has been developed for whole genome sequencing in rare disease and cancer patients as part of the 100,000 genomes project. As this project approaches its target, it is timely to look at how this has been achieved, the lessons learnt, and the prospects for the future, as whole genome sequencing becomes part of the clinically commissioned Genomic Medicine Service for the NHS.

Kathryn Woodfine

Product Specialist, Agilent Technologies

THE AGILENT NGS WORKFLOW

Agilent presents it’s NGS workflow, from sample to data interpretation. Including the latest library preparation offering; SureSelect XT HS and SureSelect XT Low Input, that is suitable for limited samples such as FFPE. We will be introducing our latest software platform; Alissa Align and Call, the next evolution of Agilent’s Alissa Clinical informatics solution. Agilent will also be showing a sneak preview of our upcoming automation platform.

Dr Danielle Folkard

Market Development Manager for the Universal NGS Library Preparation Portfolio, Qiagen

QIAGEN SOLUTIONS FOR CLINICAL GENOMICS

‘QIAGEN delivers Sample to Insight solutions for molecular testing. With Next Generation Sequencing becoming a routine technology in the clinical diagnostic laboratory, there is a need for scientifically robust, reproducible and cost-effective solutions. The QIAseq portfolio includes panels used in DNA & RNA Targeted Resequencing for oncology and inherited disease applications, library preparation kits for Infectious Disease testing, Immune Repertoire RNA Library kits for Immuno-oncology and single-cell compatibility for Non-Invasive Prenatal Testing and Preimplantation Genetic Diagnosis. With Single Primer Extension technology, Unique Molecular Identifiers and our complex understanding of PCR chemistry, QIAseq products provide increased accuracy, specificity and reproducibility to your workflows’.

Professor Rachel M Chalmers

Cryptosporidium Reference Unit, Public Health Wales Microbiology and Health Protection, Singleton Hospital, Swansea, UK

Swansea University Medical School, Singleton Park, Swansea

Aberystwyth University, Penglais Hill, Aberystwyth

CRYPTOSPORIDIUM GENOMICS: WHERE HAVE WE COME FROM AND WHERE ARE WE GOING?

Cryptosporidium is a protozoan parasite that causes diarrhoeal disease in pre-weaned ruminant livestock and prolonged gastroenteritis (cryptosporidiosis) in humans, especially young children. Symptoms can last 2-3 weeks, but are self-limiting in immuno-competent hosts, immuno-compromise can lead to severe, sometimes life-threatening disease. Most human disease in the UK is caused by C. parvum(zoonotic) or C. hominis(anthroponotic).

Routine stool diagnostics are based on either microscopy of stained smears, detection of antigens by enzyme immunoassays or DNA by PCR for presence/absence of the genus. Although species-level genotyping is undertaken on positive stools referred to the Cryptosporidium Reference Unit, there is no standardised multilocus subtyping scheme for C. parvumand C. hominis, and Sanger sequencing the gp60 gene is used to characterise isolates especially in outbreaks.

The number of Cryptosporidiumreference genomes released so far is limited and the diversity within and between Cryptosporidiumspecies is still being discovered. The generation of new reference genomes requires either passage of clinical isolates through animals to collect sufficient oocysts or the use of whole genome amplification for subsequent genomic DNA recovery and sequence analysis. Either way, pre-preparation before DNA extraction is critical.

Nethertheless, the number of available genomes is increasing and improving our understanding of the biology, pathology and evolution of this parasite. At the conference I will describe some of the work undertakento support public health applications of Cryptosporidiumgenomics.

Dr. Matt Berriman

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom

COMPARATIVE GENOMICS INSIGHTS INTO THE EVOLUTIONARY HISTORY OF MALARIA

Six species of malaria parasites infect humans but only one species, Plasmodium falciparum, is responsible for hundreds of millions of deaths each year. P. falciparumis a member of a subgenus that infect Great Apes but generally cause mild or asymptomatic infections. In 2008, a landmark study revealed that P. falciparumemerged from a lineage of parasites that currently infects gorillas. However, the differences between P. falciparumand its close relatives were poorly understood and dating the emergence of the species has been a matter of debate. Working with collaborators based in sanctuaries in Gabon, parasites were purified from blood samples taken during routine health checks. Using a variety of sequencing strategies with short and long reads, we were able to construct highly contiguous genome assemblies. In this presentation, I’ll describe how we estimated the times at which individual species emerged in the sub-genus and from comparative data, pieced together the major genomic events that led to the emergence of this fully human-infective and deadly species.

Dr. Matt Fisher

Department of Infectious Disease Epidemiology, St Mary's Hospital, Imperial College London.

BIG GENOMICS APPROACHES TO ADDRESSING BIG FUNGAL PROBLEMS

The Kingdom Fungi is a biodiverse and essential component of our habitable Planet. However, recent decades have seen an increase in the number of pathogenic fungi infecting natural populations and managed landscapes. In both animals and plants, this increase in fungal diseases are causing some of the most severe die-offs and extinctions ever witnessed in wild species; fungi are also increasingly recognized as presenting a worldwide threat to food security and the healthy functioning of ecosystems. In parallel, clinicians and biomedical scientists are fighting emerging fungal pathogens that infect millions of people every year and there are signs that fungi are become increasingly adapted to resist frontline antifungal therapies. Traditional approaches to studying the biology of fungal infections are currently being transformed by the growing number of high-quality assembled genomes, by world-wide surveys of population-genomic data and by new technological and informatic strategies. This talk will discuss current challenges in emerging fungal diseases in order to identify weaknesses in our armamentarium against fungal infections. Rapid progress is being made in our understanding of how to manage fungal disease in clinical and agricultural settings, however mass-deployment of antifungal drugs and the development of monocultures has brought new risks to health and biosecurity. This talk will discuss how genomics is generating insights into what patterns of fungal disease might look like in the future and whether there are ways we can tackle the fungal pandemic.

Amber Leckenby

PhD student, Institute of Integrative Biology, University ofLiverpool, Liverpool, UK

GENOMIC ANALYSES OF ENTAMOEBA HISTOLYTICA USING THIRD-GENERATION SEQUENCING

Entamoeba histolytica is an important pathogen of humans causing an estimated 100,000 deaths annually, often in some of the world’s poorest communities. However, most infections do not cause disease and there is growing interest in understanding how human and pathogen genetics determine the outcome of infection.The current reference genome was sequenced to 12.5X coverage and published in 2005. Subsequent efforts re-assembled and re-annotated this genome into a vast 1,496 scaffolds. This is largely owed to the nature of the E. histolytica genome; it contains many long repetitive regions which are not spanned by the current sequencing reads attained using Sanger, 454 and next-generation sequencing technologies. These long repeats have meant assembling the E. histolytica genome has proved particularly challenging. Due to this, the large-scale structure of the E.histolytica genome is largely unknown, ploidy is debated and gene distribution along the chromosomes is not yet known unlike other protozoan parasites. The fragmented nature of the assembly also means that sub-telomeric enrichment for large gene families and the existence of actual telomeric structures are yet to be identified. Here we present our current efforts to improve the current E. histolytica HM-1:IMSS genome using single molecule sequencing. We have utilised the long-insert paired reads that single-molecule sequencing produces, alongside the improved algorithms for assembling genomes, in a multi-platform approach. From this, we have elucidated large-scale structural characteristics such as the evidence of unique telomeric structures, rRNA episomes and transposable element-associated gene families. In addition, we have utilised bisulphite sequencing, alongside RNA-seq data, to study any epigenetic effects that may be important in determining amebiasis infection outcome.

Rajan Pandey

Authors: Rajan Pandey¹, Matthew Boucher¹, Maggie Lu² , Aline Freville¹ , Declan Brady¹, Mohammad Zeeshan¹, Edward Rea¹, Anthony A. Holder³, Richard Wall⁴, Karine Le Roch² and RitaTewari¹

¹ School of Life Sciences, Queens Medical Centre, University of Nottingham, Nottingham, UK, ² Institute forIntegrative Genome Biology, University of California, USA, ³ The Francis Crick Institute, London, UK, ⁴ School of Life Sciences, University of Dundee, Dundee, UK.

DECIPHERING PLASMODIUM CONDENSIN DURING ATYPICAL CELL DIVISION AND PROLIFERATION

Cell division and proliferation require chromosome replication and segregation to ensure two daughter cells with identical copies of the genome. Division and proliferation within host cells of Plasmodium, the causative agent of malaria, have features that differ from those typical of many eukaryotes due to the presence of unique kinases, cyclins and cell cycle regulators. Malaria parasites undergo two unusual forms of closed mitotic division:schizogony (which resembles endomitosis with repeated nuclear division without chromosome condensation and preceding cell division) and endo-reduplication during male gametogenesis (three rounds of rapid nuclear replication followed by cell division and chromosome condensation to form eight microgametes). In eukaryotes, Structural Maintenance of Chromosomes (SMC) proteins are implicated in chromosome segregation and condensation and most organisms have at least six genes encoding SMC proteins. In silico screening suggests the presence of six SMC genes in the Plasmodium genome but their role(s) in cell division are currently unknown. Here, we analysed the function, localisation and components of the condensin complex (formed on SMC2 and SMC4) during mitotic stages of the parasite life cycle using in silico, in vitro and in vivo techniques. Our results suggest that the condensin complex is essential for both types of mitotic cell division, but there are different protein partners in the complex during male gametogenesis and schizogony.

Rahila Sardar

Authors: Rahila Sardar^1,2 , Abhinav Kaushik^{1 *}, Rajan Pandey^{1 *}, Shakir Ali 2 , and DineshGupta ¹

¹ Tranlational bioinformatics group, International Centre for Genetic engineering and Biotechnology, New Delhi. ² Department of Biochemistry, Jamia Hamdard, New Delhi. ^*Equal Contribution

GENOME WIDE IN SILICO ANALYSIS OF PLASMODIUM SPECIES TRANSCRIPTION FACTORS AND REGULATORS FOR NOVEL DRUG DISCOVERY.

The number of annotated and characterized Transcription Factors (TFs)/Transcription Associated Regulators (TARs) in apicomplexan species, e.g. Plasmodium, is exceptionally low. The spatial and temporal gene regulation during the complex life cycle of Plasmodium spp. further makes the identification andcharacterization of TFs/TARs obligatory for understanding the parasite biology and identify novel drug targets. Currently, there are only a limited number of experimentally validated plasmodium regulatory proteins, mostly these belong to the AP2 family proteins. This motivated us to perform a genome-wide screening of TFs/TARs in six plasmodium species using various in silico approaches. Summarily, we have predicted ~500 TFs and ~2000 TARs which are further classified intoTranscription Regulators (TRs), Chromatin Regulators (CRRs) and RNA-regulators (RNARs), according to their gene ontology assignment. To understand TFs and TARs expression dynamics, we scanned publicly available gene expression profiles (n=250;RNA-seq and microarray). The integrative analysis of the available stage specific parasite gene expression data, and our findings will further provide useful insights into the stage specific plasmodium transcription regulation.

Dr Alan Walker

Senior Lecturer, Rowett Institute, University of Aberdeen

THE HUMAN GUT MICROBIOME: MYTHS AND TRUTHS

Thousands of different microbial species are capable of colonising the human intestines, and it has been estimated that these microbes encode more than 10 million unique genes. Under normal circumstances our resident GI tract microbes are considered to play a number of key roles in the maintenance of human health. Conversely, alterations in microbiota composition and activities have been linked to a wide range of diseases. As a result, academic, clinical, public and commercial interest in the microbiota has increased exponentially over the last decade. There is now a concerted effort, involving researchers around the world, to better understand the microbiota, and to manipulate it for therapeutic purposes.

Much of the recent progress has been underpinned by technological advances in areas such as DNA sequencing approaches. While these approaches are hugely powerful, they have inherent limitations and biases. There have been a number of encouraging advances, but much work remains to be carried out before we truly understand the role the microbiota plays, and how we might reproducibly alter it in beneficial ways.

A key challenge for the field is to try and communicate exciting advances, while cutting through the hype that sometimes surrounds this burgeoning area of research. In my talk I will give an overview of current knowledge, and attempt to dispel some persistent and common myths surrounding the human gut microbiota.

Dr Lesley Hoyles

Dr Lesley Hoyles, Nottingham Trent University

TOWARDS UNDERSTANDING THE ROLE OF THE GUT MICROBIOME IN FATTY LIVER DISEASE

Non-alcoholic fatty liver disease (NAFLD) refers to a group of conditions in which there is excess lipid accumulation (steatosis) in the liver of those who drink little to no alcohol. It is the most common cause of chronic liver disease, increasing in worldwide prevalence in line with the obesity epidemic, and is closely associated with metabolic syndrome. Animal studies have shown the microbiome contributes to the steatosis phenome, but human data are more limited regarding the role of the microbiome in disease onset/progression. Using an integrated systems biology approach it was possible to evaluate the contribution of the gut microbiome to the molecular phenome of steatosis. Various -omic (metagenomic, transcriptomic, metabolomic) and clinical data were collected for 56 non-diabetic, morbidly obese (BMI >35) women who elected for bariatric surgery. Histological examination of liver biopsies was used to grade steatosis. In common with other diseases, microbial gene richness was anti-correlated with steatosis. Even though only subtle compositional changes were observed in the faecal microbiota, increased abundance of Gram-negative Proteobacteriaand microbial processing of dietary lipids and amino acids, as well as endotoxin-related processes related to Proteobacteria, were correlated with steatosis. Involvement of Proteobacteriain steatosis was reflected in the hepatic transcriptome, in which immune responses associated with non-specific (Gram-negative, viral) microbial infections were activated. Plasma levels of the microbiome-associated metabolite phenylacetic acid (PAA) were associated with steatosis. Treatment of primary human hepatocytes with PAA and feeding the metabolite to mice led to lipid accumulation in liver cells. The steatosis phenotype was transferred upon transplantation of faeces from steatosis patients to mice. Taken together, these results demonstrate the microbiome makes a significant contribution to the steatosis phenome. There is disruption of the gut–liver axis in steatosis, which can be detected in the gut microbiome, hepatic transcriptome and metabolome.

Joshua Quick

Authors: Joshua Quick & Nicholas J. Loman

Institute of Microbiology and infection, University of Birmingham, B15 2TT

ASSESSING ULTRA-DEEP, LONG-READ METAGENOMICS ON OXFORD NANOPORE PROMETHION

The human gut microbiome is estimated to contain over 1000 microbial species with individuals harbouring >160 species[1]. However, species abundances are uneven, with many species present at very low abundances. Therefore, metagenomic sequencing approaches rely on large sequencing yields to detect this diversity. At present, most metagenomic surveys rely on high-output platforms such as Illumina. However, the short reads generated by these platforms limit specificity of taxonomic assignment and result in highly fragmented assemblies.

Single molecule sequencing platforms are able to sequence much longer molecules, however until recently have seen limited yields (<10 Gb per run). The Oxford Nanopore PromethION has recently entered 'alpha-beta' test phase and runs generating >100 Gb per flowcell have been reported by early testers. This increase in output suggests that the study of complex microbial communities using shotgun metagenomics may soon be practical.

To determine the suitability of the PromethION for long read metagenomic studies we sequenced a microbial mock community to assess platform performance. The ZymoBIOMICS Microbial Community Standard II contains 10 species (5 Gram-positive and 3 Gram-negative bactera, and 2 years species) at an uneven log-distribution of abundances ranging from 10^2 - 10^8 cells. The sample was extracted using FastPrep bead-beating and prepared for sequencing using the LSK-109 kit. In order to extract sufficient DNA for a nanopore run, six reactions were employed, meaning the lowest abundance organism (Staphylococcus aureus) is represented by only 600 cells

A single PromethION flowcell generated a >130 Gb dataset with a mean read length of ~3.5 kb and an N50 of ~5 kb. The four highest abundance organisms giving coverage sufficient for whole-genome assembly (~38,000X, 370X, 344X, 87.8X for Listeria monocytogenes, Pseudomonas aeruginosa, Bacillus subtilisand Saccharomyces cerivisiaerespectively). Three more organisms yielded sufficient information for gene-detection studies (Escherichia coli, Salmonella entericasubsp. enterica and Lactobacillus fermentum). The remaining three species were confidently detected with unambiguous read mappings.

The PromethION metagenomic mock community dataset is a useful baseline measurement for assessing the use of long-read sequencing for studies of the microbiome on PromethION. Our results emphasise that alternatives to standard be are needed to generate longer reads for nanopore sequencing.

[1] Qin, J., et al., A human gut microbial gene catalogue established by metagenomic sequencing. Nature, 2010. 464(7285): p. 59-65.

Benjamin Thomas

Authors: Ben Thomas (Aberystwyth University), Sharon Huws and Chris Creevey (Queen’s University, Belfast), Kai Hilpert (St. George’s Hospital, London)

PILLS 'N' THRILLS AND BELLYACHES: USING AMPLY FOR COMPUTATIONAL NOVEL ANTIBIOTIC DISCOVERY IN REALLY STRANGE PLACES.

Bacterial antibiotic reseistance is widely regarded to be one of the most pressing threats facing humanity. Finding new antibiotics is a vital research area and can now be supported by a vast reservoir of readily available 'omic data on the back of the explosion of low cost sequencing technologies.

Antimicrobial Peptides (AMPs) are endogenous peptides that provide a fast and effective means of defence against pathogens as part of the innate immune response. The detection of AMPs in metagenomic data is a tantalising low-hanging fruit for computational biologists. Large reservoirs of existing sequences exist and are well annotated and understood. Post-computational wet-lab work is relatively cheap with spot synthesis of peptides cheaply available from a wide array of third party companies. A well organised screening program can screen 100s of prospects a day against model bacterial organisms to test for activity and is one of the few areas of biological science that can scale to meet the data output from computational prediction toolkits.

AMPLY, an in-house tool designed at Aberystwyth University, supported by Life Science Wales and working in collaboration with St. George's Hospital (London) and Queen's University (Belfast) is part of a next wave of computational drug discovery platforms and is already uncovering a treasure trove of novel AMPs in diverse microbial environments. We highlight the significant benefits of forming a link between the understanding of microbial community dynamics, directed bioinformatics and confirmatory lab screening and the numerous novel antimicrobials identified to date.

Katherine Brown

Authors: Katherine Brown¹, Andrew E. Firth¹

¹Division of Virology, Department of Pathology, University of Cambridge

IDENTIFICATION OF VIRAL TRANSCRIPTS IN RNA-SEQ DATASETS FROM BEES, MITES AND ANTS

Honey bees play a vital role in global food production and are of great economic importance. Since 2006, many bee colonies have suffered large losses due to colony collapse disorder, the cause of which is unknown. This phenomenon has led to widespread efforts in sequencing honey bee pathogens, most notably RNA viruses such as chronic bee paralysis virus, deformed wing virus and sacbrood virus. However, honey bees coexist with a number of other arthropods, whose viruses are less thoroughly characterised. In particular, parasitism by Varroa mites is almost ubiquitous amongst honey bees. These mites are known to act as effective vectors for a number of RNA viruses and are widely considered to have been instrumental in the spread of colony collapse disorder. However, little is known about viruses endemic to mites, so it is difficult to determine the extent to which mites impact the bee virome. Ants have also been shown to introduce and exchange viruses with bees. As ants are closely to related to bees (both are members of the Hymenoptera order), they have the potential to allow us to determine which viruses are specific to bees and which have a broader host distribution.

We have previously demonstrated that it is possible to detect and characterise viral RNA in publicly available RNA-seq datasets generated for other purposes. There are over 3,000 such datasets for diverse bee, mite and ant species. We have developed a computational pipeline to identify viral transcripts in these datasets. This pipeline performs quality control and adapter trimming, removes low complexity reads and reads generated from host RNA and various known contaminants, assembles the remaining reads into transcripts and detects the presence of regions with homology to known RNA viruses. Viral fragments identified with this pipeline will be examined phylogenetically to identify novel pathogens, clarify host range and specificity, and characterise transmission patterns. This will increase our understanding of the interplay between the viromes of these related arthropod species.

Sarah Hemmasi

Technical Marketing Assistant, Cambridge Bioscience

STANDARDIZING MICROBIOMICS - REMOVING BIAS IN COLLECTION, PURIFICATION AND ANALYSES

The rapid growth of Microbiomics has increased the demand for standard methods to improve the reproducibility and quality of the data being generated. To address these fundamental challenges, the scientists at Zymo Research have created reference materials for the development of the most accurate and unbiased workflows. Zymo Research has the goal to provide researchers the best tools for microbiome measurement to ensure standardized microbiomics workflows. The ZymoBIOMICS™ portfolio has been developed to eliminate bias across microbiomics workflows and offers a complete pipeline from start-to-finish for all your microbiome related needs.

The field of microbiomics has developed rapidly in the past several years. However, there are concerns due to poor data reproducibility across labs. To objectively assess the performance of different microbiomics workflows, it is essential to have accessible, well-defined, and accurately characterized mock microbial community standards to serve as reference materials for optimization, validation, and controls for microbiomic workflows. Acknowledging this deficit, the scientists at Zymo Research have created a well-characterized mock microbial community to be used as a reference material for microbiome measurements. Using this microbial standard, we assessed the performance of several of the most cited DNA extraction protocols used in the Microbiomics field and the effect of various library preparation techniques for 16S and shotgun sequencing. Thus, improving all steps involved from sample collection and DNA extraction to sequencing and bioinformatics will harmonize the data generated in this rapidly expanding field of research.

Dr Dario Riccardo Valenzano

Group Leader, Max Planck Institute for Biology of Ageing, Cologne, Germany

THE GENOMIC PHYLOGENY OF AFRICAN KILLIFISHES REVEALS PERVASIVE GENOME-WIDE RELAXATION OF SELECTION UPON ADAPTATION TO ANNUAL ENVIRONMENTS

African killifishes have evolved in a wide range of environments, from rain forest to arid savanna woodlands, characterised by intermittent water availability. However, the genomic events underlying adaptations to this range of environments are largely unknown. To study the evolutionary genomic events underlying adaptations to environments with different degrees of annual precipitation and temperatures, we sequenced the genome of 45 African killifish species from different habitats, generating four de novo genome assemblies and annotations. Independent adaptations to annual environments are characterised by convergent positive selection and by extensive genome-wide relaxation of selective constraints, leading to significant increase in genome size and to excess of functional sequence divergence at highly conserved genes. Genomic resequencing in 235 individuals from two annual species, in populations ranging from dry to wet environments, revealed that individuals from dry environments have smaller effective populations size, have undergone more severe bottlenecks, leading to high frequency of deleterious mutations. Loss of selective constraints in species evolved in annual environments led to the accumulations of novel mutations at conserved sites in key ageing-modulating genes, including TOR, InsR, Ampk and Foxo3. Mitochondrially-encoded genes show significant accumulation of novel functional gene variants in all but one gene, showing that relaxation of selection pervades nuclear and mitochondrial genome evolution in species evolved in annual environments. We demonstrate that relaxation of selection is a major evolutionary force that moulds genome evolution in species evolving under annual environments, providing a fundamental mechanism to explain life history trait evolution.

Professor Bertie Gottgens

Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK.

Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK

A SINGLE-CELL RESOLUTION ROADMAP FROM MOUSE GASTRULATION TO EARLY ORGANOGENESIS

The generation of cellular diversity is a hallmark of all metazoan life. Across the animal kingdom, gastrulation represents the key developmental stage at which embryonic pluripotent cells diversify into the lineage-specific precursor cells that will generate the adult organism. Despite its fundamental importance, our understanding of mammalian gastrulation has remained far from complete because the limiting cell numbers in early embryos preclude conventional molecular analysis. I will discuss our recently generated transcriptional profiles for ~90,000 single cells from mouse embryos collected at nine sequential time-points ranging from 6.5 to 8.5 days post-fertilisation. We have used this new dataset to reconstruct a molecular roadmap of cellular differentiation from pluripotency towards all major embryonic lineages, and explore the complex molecular and cellular events involved in the convergence of visceral and primitive streak-derived endoderm. I will also outline how this work can serve as a vital baseline for understanding the effects of developmental gene mutations, as well as a critical step for the optimisation of in vitro differentiation protocols for regenerative medicine.

Anish Dattani

Authors: Anish Dattani, Damian Kao, Yuliana Mihaylova, Prasad Abnave, Samantha Hughes, AlvinaLai, Sounak Sahu, and Aziz Aboobaker

EPIGENETIC ANALYSES OF PLANARIAN STEM CELLS DEMONSTRATE CONSERVATION OF BIVALENT HISTONE MODIFICATIONS IN ANIMAL STEM CELLS

Planarian flatworms have an indefinite capacity to regenerate missing or damaged body partsowing to a population of pluripotent adult stems cells called neoblasts (NBs). Currently, little isknown about the importance of the epigenetic status of NBs and how histone modificationsregulate homeostasis and cellular differentiation. We have developed an improved and optimized ChIP-seq protocol for NBs in Schmidtea mediterranea and have generated genome-wide profiles for the active marks H3K4me3 and H3K36me3, and suppressive marks H3K4me1 and H3K27me3. The genome-wide profiles of these marks were found to correlate well with NBgene expression profiles. We found that genes with little transcriptional activity in the NB compartment but which switch on in post-mitotic progeny during differentiation are bivalent, being marked by both H3K4me3 and H3K27me3 at promoter regions. In further support of this hypothesis bivalent genes also have a high level of paused RNA Polymerase II at the promoter-proximal region. Overall, this study confirms that epigenetic control is important for the maintenance of a NB transcriptional program and makes a case for bivalent promoters as aconserved feature of animal stem cells and not a vertebrate specific innovation. By establishinga robust ChIP-seq protocol and analysis methodology, we further promote planarians as apromising model system to investigate histone modification mediated regulation of stem cell function and differentiation.

Abdulkadir Abakir

Authors: Abdulkadir Abakir¹ & Alexey Ruzov¹

Affiliations: ¹ Wolfson Centre for Stem Cells, Tissue Engineering and Modelling (STEM), Division of Cancer and Stem Cells, School of Medicine, Centre for Biomolecular Sciences, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.

N6-METHYLADENOSINE REGULATES CELL CYCLE DYNAMICS OF RNA:DNA HYBRIDS

R-loops are specific nucleic acid structures formed by an RNA:DNA hybrid and an unpaired single stranded DNA1. RNA:DNA hybrids contribute to a number of important biological processes ranging from transcriptional regulation to DNA repair and represent a source of genomic instability in mammalian cells2-4. Notably, the presence of non-canonical bases on the RNA component of R-loops has not been reported to date. Here we show that N6-methyladenosine (m6A), modification involved in the regulation of mRNA stability and translation5, 6, is present on the majority of RNA:DNA hybrids in human pluripotent stem cells (hPSCs). Moreover, we demonstrate that m6A-containing R-loops accumulate in the introns, LINE1 and SINE/Alu repeats during G2/M and are depleted at G0/G1 phases of the cell cycle in hPSCs. Furthermore, we show that YTHDF2, one of the previously characterized m6A reader proteins regulating mRNA degradation7, migrates to mitotic chromatin in dividing cells where it interacts with genomic regions enriched in RNA:DNA hybrids and another m6A reader, HNRNPA2B18, binds to R-loops-containing intronic regions in interphase nuclei. Correspondingly, siRNA-mediated depletions of YTHDF2 and m6A methyltransferase METTL3 both lead to increase in repeat-specific and intronic RNA:DNA hybrids. Our results provide a new perspective on m6A as an integral component of RNA:DNA hybrids that is involved in the regulation of their cell cycle-specific degradation and imply potential roles for this modification in safeguarding genomic stability, regulating splicing and suppressing retro-transposition in hPSCs.

Alysha Taylor

Authors: Alysha S. Taylor¹^,2, Thomas A. Walsh³, Bede Constantinides¹, Niamh Forde², Mary J. O’Connell^1,4

¹Computational and Molecular Evolutionary Biology Group, School of Biology, Faculty of Biological Sciences, University of Leeds, LS2 9JT, UK. ²Discovery and Translational Sciences Department, Leeds Institute of Cardiovascular and Metabolic Medicine, School of Medicine, University of Leeds, LS2 9JT, UK. ³Bioinformatics and Molecular Evolution Group, School of Biology, Dublin City University, Glasnevin, Dublin 9, Ireland. ⁴Computational and Molecular Evolutionary Biology Group, School of Life Sciences, University of Nottingham, NG7 2RD, UK.

A COMPARATIVE GENOMICS APPROACH TO IDENTIFY microRNAs SPECIFIC TO PLACENTAL MAMMALS.

Placenta emerged in the mammal lineage ~93 Million years ago and defines the Eutherian clade. The evolution of placental tissue was likely facilitated by changes to both protein coding and regulatory regions of the genome. MicroRNAs are non-protein coding regulators of gene expression that act post-transcriptionally by binding primarily to the 3’UTR of mRNA and preventing or enhancing translation. Recent studies suggest that microRNAs have interesting properties: once a microRNA has emerged it is rarely lost and expansions in microRNA families have been associated with periods of morphological innovation, such as the diversification of Bilateria. Using literature searches we assembled a set of 132 microRNAs that have a putative role in placental function. Large-scale sequence similarity searches of 20 high quality vertebrate genomes (10 placental mammals, 2 non-placental mammals – 1 monotreme and 1 marsupial, and 8 non-mammalian vertebrates (including birds, reptiles, amphibians and fish) allowed us to identify 90 microRNAs specific to eutherian mammals and are not found elsewhere in vertebrate tree of life. Using a presence absence matrix constructed for these 90 microRNAs, a gain-loss analysis was performed in TNT (Tree analysis in New Technology). 11 microRNAs were found to have emerged on the stem placental mammal lineage and were never subsequently lost in any placental mammal tested. We propose that these microRNAs have contributed to the emergence of the placenta in mammals. The 11 stem eutherian microRNAs were then investigated using TargetScan 7.0 to determine potential protein coding targets. Functional validations will be carried out in vitro to determine their roles in placental function. We present our findings thus far on the origin, evolution and function of placental microRNAs.

Dr Virginia Howick

Authors: Virginia Howick, Andrew Russell, Adam Ried, Tom Metcalf, Oliver Billker, Arthur Talman, Mara Lawniczak

Wellcome Sanger Institute, Hinxton, CB10 1SA, UK

A Malaria Cell Atlas: Understanding transcriptional variation across the Plasmodium life-cycle using single-cell RNA-seq.

Single-cell RNA-sequencing is revolutionizing our understanding of parasite populations. Using this technology, we are now able to understand how a unicellular parasite coordinates gene expression throughout its life-cycle and in response to environmental stimuli. Here we present the initial effort in a Malaria Cell Atlas, which currently consists of single-cell transcriptomes covering the entire Plasmodium berghei life-cycle, including stages in both mosquito and mammalian hosts. Using these data, we are able to finely map the developmental trajectory of the parasite across the life-cycle and have identified differentially expressed and highly variable genes across host environment and parasite phases (invasive, replicative, and sexual stages). We are currently adapting these methods to characterize the vector and host response at a single-cell level with the goal of profiling all cellular players involved in interplay leading to transmission and pathogenesis.

Professor Wolf Reik

Authors: Wolf Reik^1,2,3, Ferdinand von Meyenn¹, Melanie Eckersley-Maslin ¹, Stephen Clark¹, Thomas Stubbs ¹, Hisham Mohammed ¹, Rebecca Berrens ¹, Fatima Santos¹ & Wendy Dean¹

¹Epigenetics Programme, Babraham Institute, Cambridge CB22 3AT, ²Centre for Trophoblast Research, University of Cambridge, CB2 3EG, ³Wellcome Trust Sanger Institute, Cambridge CB10 1SA

SINGLE CELL EPIGENOME LANDSCAPE OF DEVELOPMENT AND AGEING

Epigenetic information is relatively stable in somatic cells but is reprogrammed on a genome wide level
in germ cells and early embryos. Epigenetic reprogramming appears to be conserved in mammals
including humans. This reprogramming is essential for imprinting, and important for the return to naïve
pluripotency including the generation of iPS cells, the erasure of epimutations, and perhaps for the
control of transposons in the germ line. Following reprogramming, epigenetic marking occurs during
lineage commitment in the embryo in order to ensure the stability of the differentiated state in adult
tissues. Signalling and cell interactions that occur during these sensitive periods in development may
have an impact on the epigenome with potentially long lasting effects. The epigenome changes in a
potentially programmed fashion during the ageing process; this epigenetic ageing clock seems to be
conserved in mammals.
Our recent work addresses the mechanisms and consequences of global epigenetic reprogramming in
the germ line, and the role of passive and active mechanisms of DNA demethylation. Using single cell
multi-epigenomics techniques, we are beginning to chart the epigenetic and transcriptional dynamics and
heterogeneity during the exit from pluripotency, symmetry breaking, and initial cell fate decisions leading
up to gastrulation. We are also interested in the potentially programmed degradation of epigenetic
information during the ageing process and how this might be coordinated across tissues and individual
cells.

Dr Peter Vegh

Research Associate, Haniffa lab, Institute of Cellular Medicine, Medical School, Newcastle University

DISCOVERING THE DIVERSITY OF HUMAN SKIN IMMUNE CELLS USING SINGLE-CELL RNA-SEQ

Modern sequencing technologies allow single-cell transcriptome measurements of thousands of cells from a tissue sample. This has revolutionised our understanding of cellular heterogeneity within human tissues. In this presentation, I will outline our approach to deconstruct the cellular composition of human skin using single-cell RNA sequencing. Using a droplet-encapsulation platform to analyse ~100 000 skin cells from three adult donors, we demonstrate the cellular composition and functional organisation of healthy human skin.

Dr. Daniel Liber

WaferGen Biosystems, now part of Takara Bio

ICELL8 cx: THE OPEN PLATFORM FOR SINGLE-CELL GENOMICS

Single-cell genomics allows to investigate cellular heterogeneity at an unprecedented resolution. The SMARTer ICELL8 cx Single-Cell System gives more control in the experimental design, more confidence in the data and unique workflow flexibilities, while reducing experimental costs.

The ICELL8 multi-nanowell chip can isolate hundreds of cells from multiple samples at once, from the very small, like nuclei from frozen tissues, to the very large, like primary cardiomyocytes and 3D spheroids.

The SMARTer ICELL8 has been validated for multiple applications, including gene expression analysis, full-length transcriptomics, T-Cell Receptor sequencing and ATAC-seq, which have been developed by Takara Bio’s R&D or ICELL8 users.

Professor William F. Martin

Professor of Molecular Evolution at the University of Düsseldorf

IN SEARCH OF GENOME NUMBER 1: UNCOVERING THE GENOME OF THE FIRST MICROBE

Life is a chemical reaction, an exergonic chemical reaction. What was the chemical reaction from which the first cells arose, and what was the chemical reaction that fuelled the first free-living cells? These are questions about chemistry and physiology, but molecular evolution can contribute. The last universal common ancestor (LUCA) is the assemblage of cells from which all life evolved roughly four billion years ago. Genomes and phylogeny have yielded new avenues to understanding early evolution and LUCA. We know LUCA had the universal genetic code shared by all descendant life forms. But how did LUCA harness energy? The chemical reactions that help cells harness energy from their environments today seem almost as diverse as life itself. Which forms of energy harnessing are ancient? We looked at that question using data from sequenced microbial genomes. We found that LUCA lived from gases ? H2, CO2, H2S, CO, N2 ? in a setting that looked very much like a modern submarine hydrothermal vent. The classical approach to investigate LUCA using genomes is to identify genes that are present in all modern cells hence present present in LUCA. We asked which genes trace to LUCA by phylogenetic criteria. The results indicate that the first forms of life were anaerobic chemoautotrophs that evolved from preexisting geochemical processes involving exergonic reactions of H2, metals, and CO2.

Dr Jordi Paps

Jordi Paps; University of Essex & University of Oxford; jpapsm@essex.ac.uk

RECONSTRUCTION OF THE FIRST ANIMAL GENOME REVELAS A BURST OF GENOMIC NOVELTY

Understanding the emergence the Animal Kingdom is one of the major challenges of modern evolutionary biology. Many genomic changes took place along the evolutionary lineage that gave rise to the Metazoa. Recent research has revealed the role that co-option of old genes played during this transition, but the contribution of genomic novelty has not been fully assessed. Using extensive genome comparisons between metazoans and multiple outgroups we infer the minimal protein-coding genome of the first animal, in addition to other eukaryotic ancestors, and estimate the proportion of novelties in these ancient genomes. Contrary to the prevailing view, this uncovers an unprecedented increase in the extent of genomic novelty during the origin of metazoans, and identifies 25 groups of metazoan-specific genes that are essential across the Animal Kingdom. We argue that internal genomic changes were as important as external factors in the emergence of animals.

Fiona Whelan

Authors: Whelan FJ, Rusilowicz M, & McInerney JO

University of Nottingham

The CO-OCCURRENCE AND CO-EXCLUSION OF EVOLVING OBJECTS IN PROKARYOTES

Throughout evolution, evolving objects (domains, genes, operons etc.) have continuously combined, forming new proteins, gene clusters, and genomes. Horizontal gene transfer, particularly among prokaryotes, has facilitated this combinatorial process. Thus, evolving objects that interact positively or synergistically with each other are expected to co-occur more often than by chance; conversely, evolving objects may avoid co-occurrence, indicating an antagonistic or redundant functionality between objects. In this work, we use methods adapted from graph theory to understand patterns of co-occurrence and exclusion in prokaryotes. We have implemented multi-level graph models in which each node (vertex) is a domain, gene, operon, or species connected by an edge (relationship) to another node to display these coincidence relationships. Our method incorporates the phylogenetic distribution and synthenic distances of evolving objects, and we demonstrate how these concepts can be used to identify conserved clusters of vertical and horizontally inherited units of selection. We apply these multi-level graph models to a variety of datasets including prokaryotic pangenomes, a representative set of prokaryotes, and metagenomic sequencing datasets from human-associated microbial communities. We find evidence for evolving objects that significantly co-occur with each other within each of these datasets; these genetic clusters include objects from characterized biological pathways but also include genes with unknown functions. Further, we identify genes that exclude each other, indicating evolving objects with antagonistic or redundant biological functions. This work represents a different approach to understanding the evolution of prokaryotes and allows us to draw novel hypotheses as to the potential role of these genetic clusters in prokaryote biology.

Dr Silvia Busoms

John Innes Centre, Norwich

ECOLOGICAL AND POPULATION GENOMICS REVEALS FLUCTUATING SELECTION ON MIGRANT ADAPTIVE SODIUM TRANSPORTER ALLELES IN COASTAL ARABIDOPSIS THALIANA

The outcrossing relatives Arabidopsis arenosa and Arabidopsis lyrata are increasingly the subjects of population genomic studies of adaptive evolution. These works provide case studies for how population genomics can be applied to targeted questions, from understanding the basis of adaptation to whole genome duplication (WGD) to the genomic basis of adaptation to extreme environments, including toxic mines and high salinity soils. I present an overview of our studies that allows for a large-scale investigation of within- and between-population evolutionary dynamics in this model genus.

We individually resequenced ~600 A. arenosa genomes from 70 diploid and autopolyploid populations, allowing the dating and ordering of successive selective sweeps as lineages follow distinct evolutionary trajectories and diversify across Europe. We integrate these data with 120 A. lyrata and Arabidopsis halleri genomes for a genus-wide view of the genomic basis of diverse adaptations. In A. arenosa, we observe that the population genomic consequences of WGD are pervasive: following WGD there is evidence of a reduced efficacy of purifying selection, with an increase in non-synonymous polymorphisms, and patterns of linkage disequilibrium differ dramatically between ploidies. Autotetraploid diversity is further enriched via local introgression from distantly related diploid populations to the extent that the signal of tetraploid monophyly is largely erased, except at discrete loci resistant to interploidy introgression. Examples of such barrier loci encode alleles that mediate adaptation to WGD. In addition, the tetraploids specifically exchange compelling candidate alleles for interspecies adaptive gene flow with autotetraploid A. lyrata. We hypothesise that the combined effects of initial masking of deleterious mutations, a higher proportion of adaptive substitutions and rampant interploidy (and interspecies) introgression likely all conspire to shape the evolutionary potential of these young autopolyploids.

Peter Mulhair

Authors: Peter O. Mulhair^1,2 , Raymond J. Moran³ , Chris J. Creevey ⁴ , Bede Constantinides¹ ,Ian M. Carr ⁵ , James O. McInerney^6,7 , Davide Pisani⁸ , Mary J. O’Connell^1,2,3 *

¹Computational and Molecular Evolutionary Biology Group, School of Biology, Faculty of Biological Sciences, University of Leeds, LS2 9JT, UK, ²Computational and Molecular Evolutionary Biology Group, School of Life Sciences,University of Nottingham, NG7 2RD, UK, ³Bioinformatics and Molecular Evolution Group, School of Biology, Dublin City University, Glasenvin, Dublin 9, Ireland, ⁴School of Biological Sciences, Queen’s University Belfast, BT7 1NN, UK, ⁵School of Medicine, St James’s Hospital, University of Leeds, Leeds, LS2 9JT ,UK, ⁶Division of Evolution and Genomic Sciences, University of Manchester, M13 9PL, UK, ⁷School of Life Sciences, University of Nottingham, NG7 2RD, UK, ⁸School of Biological Sciences, University of Bristol, BS8 1TH, UK

GENE FUSION EVENTS IN METAZOA - PATTERNS OF EMERGENCE AND POTENTIAL USE AS PHYLOGENETIC MARKERS

Molecular systematics has resolved key evolutionary relationships amongst Animalia. However, controversy remains over the composition of major clades such as the Spiralia and Panarthropoda and the relationships between lineages such as those at the root of the animal tree, between Porifera and Ctenophora. Recent studies have applied an array of different approaches to resolve these issues, including the use of alternative types of sequence data and the application of complex models of evolution. Rare Genomic Events (RGEs) such as insertion-deletion events and microRNAs, have proven powerful in the resolution of highly debated relationships such as the root of the placental mammal tree and the placement of Tardigrada. Here we have investigated the evolutionary properties of a specific type of RGE- gene fusion. From a dataset of 1.2 million protein coding genes from 63 metazoan genomes we identified a total of 26,507 gene fusion families that are distributed across the species sampled. Following alignment and comparison of the gene fusions with their parent genes,
we mapped gene fusion family evolution throughout the animal tree. We found large numbers of gene fusion families emerging at distinct nodes in the animal tree, e.g. the origin of Bilateria, Gnathostomata and Mammalia. Subsequently, we assessed the characteristics of the gene fusion families to determine if they have potential as phylogenetic markers. We used well-resolved nodes to test whether the presence/absence pattern of gene fusions in our dataset recapitulates the known topology in these regions. High rates of secondary lossdemonstrated that these markers may not prove useful for phylogenetic reconstruction. However, the clustering of gene fusion events onto specific nodes of the animal tree suggests that they may play an important role in the evolution of complex phenotypic traits in animals.

Dr Gabriel Rech

Authors: Gabriel E. Rech ¹ , Véronique Jamilloux ² , Hadi Quesneville² and Josefa González ¹

¹ Institute of Evolutionary Biology (IBE-CSIC-UPF), Barcelona, Spain. ² Unite de Recherche Genomique Info (URGI-IRNA), Versailles, France.

UNRAVELLING TRANSPOSABLE ELEMENT DIVERSITY USING LONG-READ SEQUENCING

The large number of genomic sequences obtained in the last decades combined with the development of powerful comparative genomics methods have allowed the discovery of several mechanisms by which genomes are able to change over the time. However, despite its great importance for biology, we still know very little about the extent to which changes in DNA sequences are functional (i.e. show a related phenotype). This is partly because phenotypes are the result of a complex relationship between genetics and environment, but also because most studies are focused on the analysis of small mutations (e.g. SNPs) or changes in protein-coding genes (e.g.duplications, gene gain/loss, exon shuffling, pseudogenes, etc.) while other types of mutations are usually ignored. This is, for instance, the case for transposable element (TE) insertions, which relevance is mainly determined by its ability to move from one location of the genome to another, generating a great variety of mutations during the process. The role of TEs in genome evolution has already been demonstrated in several organisms. For instance, in the fruitfly Drosophila melanogaster, TEs have been found to be related with adaptive traits associated with the response tostress. However, comprehensive genome-wide analyses of TEs remain limited, mainly because its repetitive nature hampers its identification and characterization, particularly when using short-read sequencing technologies. To overcome this limitation, we sequenced and assembled D.melanogaster genomes from natural populations using long-read sequencing technologies (Pacific Biosciences and Oxford Nanopore) and we performed a de novo annotation of TEs using the REPET package, which integrates the state-of-the-art bioinformatics software for the identification and annotation of TEs. We obtained high quality genome assemblies and we were able to annotate new TEs on these genomes. When comparing with the official reference annotation (FlyBase), we identified 97.5% of the TEs already annotated but also found 3,160 TE fragments that were not previously known. In addition, we found a great variation in the TE content among the genomes, with differences in the percentage of the genome covered by TEs ranging from 18% to 29%. Our results not only demonstrate the increase in power when applying these new technologies for the discovering and studying of TEs, but also unravel the huge TE diversity among natural populations of D. melanogaster, suggesting that the role of TE insertions in shaping genome´s architecture andevolution could be highly underestimated.

Professor Chris Ponting

Group Leader, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh

Authors: Neil Clark, Giuseppe Gallone, Olympia Gianfrancesco, Chris P Ponting

IDENTIFYING CASUAL VARIANTS IN COMPLEX DISEASE THAT ALTER TRANSCRIPTIONAL FACTOR BINDING

Genome-wide association studies have linked genetic variation with a very large number of traits and diseases. What we do not know, however, in most cases, is which specific DNA variant is causal for the change in trait or in disease susceptibility. If we are to understand complex traits mechanistically then we need to identify such variants. Our approach differs from most in asking whether any DNA variants that alter the affinity of a transcription factor are causally responsible for a change in any trait. Consequently, we start with the molecular mechanism and determine for which trait this mechanism explains its variation. In our first application of this approach we have used published vitamin D receptor-binding data to identify >10 single nucleotide variants as causal of trait variation.

Devika Agarwal

Authors: Devika Agarwal^1,2 , Elena Di Daniel ² , John B. Davis² , Caleb Webber ¹

¹ Department of Physiology, Anatomy, and Genetics, University of Oxford, Oxford, UK; ² Alzheimers Research UK Oxford Drug Discovery Institute, Nuffield Department of Medicine Research Building, University of Oxford, Oxford.

A META-ANALYSIS OF MICROGLIAL TRANSCRIPTOMIC DATASETS TO PRIORITIZE DISEASE RELATED TARGETS AND MECHANISMS IN NEURO-INFLAMMATION AND NEURODEGENERATION IN ALZHEIMER'S DISEASE

Alzheimer’s disease (AD) is a devastating neurodegenerative disease with no effective treatments. There was an estimated 50 million people worldwide living with dementia in 2018 and this is projected to increase to 152 million by 2040. Analyses of large-scale gene expression data from human postmortem tissue, animal models and cellular disease models are identifying dysregulated tissue-specific gene modules whose convergent functionality can be demonstrated by clustering within various networks (phenotypic linkage networks, co-expression and protein-protein networks). By causally grounding these gene modules through their links to risk associated genetic variants, we can identify both disease causing mechanisms and novel drug targets for the treatment of complex diseases like AD. In the context of neuroinflammation, microglia are resident immune cells of the central nervous system (CNS) with important physiological functions such as homeostasis, plasticity, immunity and repair in the brain. Emerging transcriptomic and genetic studies have indicated the involvement and dysregulation of microglia related pathways as central to the risk and pathogenesis of AD and other neurodegenerative diseases.The advent of human iPSC derived cellular systems has provided an opportunity to develop ‘human’ in vitro models but these models need qualification against more physiological and relevant data derived from human patient tissue and whole animal models. This work aims to identify the most relevant paradigms for use with human iPSC derived microglia. Here, we have identified a neurodegenerative specific microglial disease axis by integrating studies involving sorted microglia from mouse models of disease (e.g. App-Psen1,Tau P301L,Trem2 Knockouts). Different microglial challenges and treatment signatures were projected onto the disease axis to compare their relevance to disease and thus identify the most relevant microglial challenge to study neurodegeneration in human iPSC cell culture models. These signatures were then compared and contrasted with transcriptomic studies from human-postmortem brains or cellular models and the CMAP database to identify microglial specific drug targets for AD.

Dr Mirjana Efremova

Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK

DECODING CELL-CELL INTERACTIONS AT THE MATERNAL-FETAL INTERFACE USING SINGLE CELL TRANSCRIPTOMICS

During the early weeks of human pregnancy, the fetal placenta implants into the uterine mucosa (decidua) where placental trophoblast cells intermingle and communicate with maternal cells. Here, we profile transcriptomes of ~50,000 single cells from this unique microenvironment, sampling matched first trimester maternal blood and decidua, and fetal cells from the placenta itself. We define the cellular composition of human decidua, revealing five distinct subsets of decidual fibroblasts with differing growth factors and hormone production profiles, and show that fibroblast states definetwo distinct decidual layers. Among decidual NK cells, we resolve three subsets, each with a different immunomodulatory and chemokine profile. We develop a repository of ligand-receptor pairs (www.CellPhoneDB.org) and a statistical tool to predict the probability of cell-cell interactions via these pairs, highlighting specific interactions between decidual NK cells and invading fetal extravillous trophoblast cells, maternal immune and stromal cells. Our single cell atlas of the maternal-fetal interface reveals the cellular organization and interactions critical for placentation and reproductive success.

Martin Fahrenberger

Martin Fahrenberger ¹ , Michael Altenbuchinger¹ , Rainer Spang ¹

¹ Department of Statistical Bioinformatics, University of Regensburg, Regensburg,Germany

EXPLORING SPATIAL TRANSCRIPTOMICS: FROM TISSUE SECTIONS TO GENE EXPRESSION PATTERNS

Spatial Transcriptomics is a recently developed sequencing technology allowing for the RNA-seq analysis of hundreds of small spots within a tissue section. This is possible at near single-cell resolution, while maintaining 2D positional information. It enables the spatial analysis and visualisation of expressed genes across tissue sections, a task which typically required staining against single proteins.

The positional information in Spatial Transcriptomics is maintained through the ligation of spatial-barcodes onto the mRNA of lysated cells in a spot. Later all spots are sequenced collectively (similar to many scRNA-seq protocols). Reads are then separated by barcode during the data analysis; this requires additional steps different to standard RNA-seq pipelines.

We present our own, easy to understand, bash-pipeline for the analysis of Spatial Transcriptomics data. It combines common RNA-seq and scRNA-seq tools in a simple script for flexibility and adaptability. The pipeline will be available at

https://github.com/Martin-Fahrenberger/ .

We further present novel approaches for the identification of genes with interesting expression patterns across tissue sections, inspired by methods in image-processing and pattern-recognition. These methods facilitate novel analyses; from the exploration of cell-type markers to the identification and localization of cancer metastases within a tissue. We demonstrate these capabilities on publicly available Spatial Transcriptomics data-sets of mouse olfactory bulb and human breast cancer samples.

Dr Sarah Bastkowski

Authors: Sarah Bastkowski, Tarang Mehta, Sushmita Roy, Will Nash, Padhmanand Sudhakar, Wilfried Haerty, Federica Di Palma

Earlham Institute

NETWORKS TO CATCH THE DIFFERENCE: CONSTRUCTION AND ANALYSIS OF REGULATORY NETWORKS APPLIED TO EAST AFRICAN LAKE CICHLIDS

In the post-genomic era we face the challenge of assigning function and context to the vast amount of data that is created every day. Therefore, we need methods to extract complex signals from large datasets and visualise them in a comprehensive way. Networks gained great popularity in life science as they allow us to model different regulatory states. In particular, the evolution of regulatory networks can be useful for studying speciation andadaptation processes, like species-specific trait development. Here, we describe our network reconstruction approach, as well as our assessment of network divergence, which has been applied to 5 cichlids species from the East African lakes. In this approach, we make use of diverse regulatory "omics" datasets and enable the investigation of tissue-specific regulatory networks in an evolutionary context.

Ivan K. Lukić, MD, PhD

Senior Field Application Scientist, Partek

SINGLE CELL RNA-SEQ DATA ANALYSIS WITH PARTEK® FLOW®

Partek Flow is a flexible and intuitive bioinformatics platform that can be used in a wide variety of genomics applications. In this presentation, we will show how to analyze a single cell RNA-Seq experiment using rigorous statistical tools and compelling exploratory analysis. With the easy to use graphical user interface, you can import single cell data from any platform, perform QA/QC, normalize & filter data, correct for batch effects, classify cells using a variety of different methods, visualize cells using 2D or 3D tSNE and PCA plots, detect differentially expressed genes and perform biological interpretation. Among other features, the software can be installed in the cloud, on a local server or in a cluster environment, making Partek Flow a complete solution for data analysis.

Wil Wellington

Product Director, Verne Global

COMPUTATIONAL BIOLOGY IN THE CLOUD. HOW HIGH PERFORMANCE COMPUTING (HPC) WITHIN THE CLOUD IS ACCELERATING OUR ABILITY TO ANALYSE COMPLEX BIOLOGICAL COMPUTE WORKLOADS EFFICIENTLY

HPC in the cloud is revolutionising intensive bio-computational workloads. This session will run through the considerations the UK research and genomics community need to be aware of when investing in, and utilising cloud HPC and how carefully attuned ‘TrueHPC' is able to provide genuine advantage in terms of end-use experience, scalability and CAPEX savings.

Abstracts

Keynote Speakers

Speakers

Conferences

Legal information

Get social