Detail of a painting depicting the landscape of New Mexico with mountains in the distance

Explore millions of high-quality primary sources and images from around the world, including artworks, maps, photographs, and more.

Explore migration issues through a variety of media types

  • Part of The Streets are Talking: Public Forms of Creative Expression from Around the World
  • Part of The Journal of Economic Perspectives, Vol. 34, No. 1 (Winter 2020)
  • Part of Cato Institute (Aug. 3, 2021)
  • Part of University of California Press
  • Part of Open: Smithsonian National Museum of African American History & Culture
  • Part of Indiana Journal of Global Legal Studies, Vol. 19, No. 1 (Winter 2012)
  • Part of R Street Institute (Nov. 1, 2020)
  • Part of Leuven University Press
  • Part of UN Secretary-General Papers: Ban Ki-moon (2007-2016)
  • Part of Perspectives on Terrorism, Vol. 12, No. 4 (August 2018)
  • Part of Leveraging Lives: Serbia and Illegal Tunisian Migration to Europe, Carnegie Endowment for International Peace (Mar. 1, 2023)
  • Part of UCL Press

Harness the power of visual materials—explore more than 3 million images now on JSTOR.

Enhance your scholarly research with underground newspapers, magazines, and journals.

Explore collections in the arts, sciences, and literature from the world’s leading museums, archives, and scholars.

Article type icon

21 Legit Research Databases for Free Journal Articles in 2022

#scribendiinc

Written by  Scribendi

Has this ever happened to you? While looking for websites for research, you come across a research paper site that claims to connect academics to a peer-reviewed article database for free.

Intrigued, you search for keywords related to your topic, only to discover that you must pay a hefty subscription fee to access the service. After the umpteenth time being duped, you begin to wonder if there's even such a thing as free journal articles .

Subscription fees and paywalls are often the bane of students and academics, especially those at small institutions who don't provide access to many free article directories and repositories.

Whether you're working on an undergraduate paper, a PhD dissertation, or a medical research study, we want to help you find tools to locate and access the information you need to produce well-researched, compelling, and innovative work.

Below, we discuss why peer-reviewed articles are superior and list out the best free article databases to use in 2022.

Download Our Free Research Database Roundup PDF

Why peer-reviewed scholarly journal articles are more authoritative.

Peer-Reviewed Articles

Determining what sources are reliable can be challenging. Peer-reviewed scholarly journal articles are the gold standard in academic research. Reputable academic journals have a rigorous peer-review process.

The peer review process provides accountability to the academic community, as well as to the content of the article. The peer review process involves qualified experts in a specific (often very specific) field performing a review of an article's methods and findings to determine things like quality and credibility.

Peer-reviewed articles can be found in peer-reviewed article databases and research databases, and if you know that a database of journals is reliable, that can offer reassurances about the reliability of a free article. Peer review is often double blind, meaning that the author removes all identifying information and, likewise, does not know the identity of the reviewers. This helps reviewers maintain objectivity and impartiality so as to judge an article based on its merit.

Where to Find Peer-Reviewed Articles

Peer-reviewed articles can be found in a variety of research databases. Below is a list of some of the major databases you can use to find peer-reviewed articles and other sources in disciplines spanning the humanities, sciences, and social sciences.

What Are Open Access Journals?

An open access (OA) journal is a journal whose content can be accessed without payment. This provides scholars, students, and researchers with free journal articles . OA journals use alternate methods of funding to cover publication costs so that articles can be published without having to pass those publication costs on to the reader.

Open Access Journals

Some of these funding models include standard funding methods like advertising, public funding, and author payment models, where the author pays a fee in order to publish in the journal. There are OA journals that have non-peer-reviewed academic content, as well as journals that focus on dissertations, theses, and papers from conferences, but the main focus of OA is peer-reviewed scholarly journal articles.

The internet has certainly made it easier to access research articles and other scholarly publications without needing access to a university library, and OA takes another step in that direction by removing financial barriers to academic content.

Choosing Wisely

Features of legitimate oa journals.

 There are things to look out for when trying to decide if a free publication journal is legitimate:

Mission statement —The mission statement for an OA journal should be available on their website.

Publication history —Is the journal well established? How long has it been available?

Editorial board —Who are the members of the editorial board, and what are their credentials?

Indexing —Can the journal be found in a reliable database?

Peer review —What is the peer review process? Does the journal allow enough time in the process for a reliable assessment of quality?

Impact factor —What is the average number of times the journal is cited over a two-year period?

Features of Illegitimate OA Journals

There are predatory publications that take advantage of the OA format, and they are something to be wary of. Here are some things to look out for:

Contact information —Is contact information provided? Can it be verified?

Turnaround —If the journal makes dubious claims about the amount of time from submission to publication, it is likely unreliable.

Editorial board —Much like determining legitimacy, looking at the editorial board and their credentials can help determine illegitimacy.

Indexing —Can the journal be found in any scholarly databases?

Peer review —Is there a statement about the peer review process? Does it fit what you know about peer review?

How to Find Scholarly Articles

Identify keywords.

Keywords are included in an article by the author. Keywords are an excellent way to find content relevant to your research topic or area of interest. In academic searches, much like you would on a search engine, you can use keywords to navigate through what is available to find exactly what you're looking for.

Authors provide keywords that will help you easily find their article when researching a related topic, often including general terms to accommodate broader searches, as well as some more specific terms for those with a narrower scope. Keywords can be used individually or in combination to refine your scholarly article search.

Narrow Down Results

Sometimes, search results can be overwhelming, and searching for free articles on a journal database is no exception, but there are multiple ways to narrow down your results. A good place to start is discipline.

What category does your topic fall into (psychology, architecture, machine learning, etc.)? You can also narrow down your search with a year range if you're looking for articles that are more recent.

A Boolean search can be incredibly helpful. This entails including terms like AND between two keywords in your search if you need both keywords to be in your results (or, if you are looking to exclude certain keywords, to exclude these words from the results).

Consider Different Avenues

If you're not having luck using keywords in your search for free articles, you may still be able to find what you're looking for by changing your tactics. Casting a wider net sometimes yields positive results, so it may be helpful to try searching by subject if keywords aren't getting you anywhere.

You can search for a specific publisher to see if they have OA publications in the academic journal database. And, if you know more precisely what you're looking for, you can search for the title of the article or the author's name.

The Top 21 Free Online Journal and Research Databases

Navigating OA journals, research article databases, and academic websites trying to find high-quality sources for your research can really make your head spin. What constitutes a reliable database? What is a useful resource for your discipline and research topic? How can you find and access full-text, peer-reviewed articles?

Fortunately, we're here to help. Having covered some of the ins and outs of peer review, OA journals, and how to search for articles, we have compiled a list of the top 21 free online journals and the best research databases. This list of databases is a great resource to help you navigate the wide world of academic research.

These databases provide a variety of free sources, from abstracts and citations to full-text, peer-reviewed OA journals. With databases covering specific areas of research and interdisciplinary databases that provide a variety of material, these are some of our favorite free databases, and they're totally legit!

CORE is a multidisciplinary aggregator of OA research. CORE has the largest collection of OA articles available. It allows users to search more than 219 million OA articles. While most of these link to the full-text article on the original publisher's site, or to a PDF available for download, five million records are hosted directly on CORE.

CORE's mission statement is a simple and straightforward commitment to offering OA articles to anyone, anywhere in the world. They also host communities that are available for researchers to join and an ambassador community to enhance their services globally. In addition to a straightforward keyword search, CORE offers advanced search options to filter results by publication type, year, language, journal, repository, and author.

CORE's user interface is easy to use and navigate. Search results can be sorted based on relevance or recency, and you can search for relevant content directly from the results screen.

Collection: 219,537,133 OA articles

Other Services: Additional services are available from CORE, with extras that are geared toward researchers, repositories, and businesses. There are tools for accessing raw data, including an API that provides direct access to data, datasets that are available for download, and FastSync for syncing data content from the CORE database.

CORE has a recommender plug-in that suggests relevant OA content in the database while conducting a search and a discovery feature that helps you discover OA versions of paywalled articles. Other features include tools for managing content, such as a dashboard for managing repository output and the Repository Edition service to enhance discoverability.

Good Source of Peer-Reviewed Articles: Yes

Advanced Search Options: Language, author, journal, publisher, repository, DOI, year

2. ScienceOpen

Functioning as a research and publishing network, ScienceOpen offers OA to more than 74 million articles in all areas of science. Although you do need to register to view the full text of articles, registration is free. The advanced search function is highly detailed, allowing you to find exactly the research you're looking for.

The Berlin- and Boston-based company was founded in 2013 to "facilitate open and public communications between academics and to allow ideas to be judged on their merit, regardless of where they come from." Search results can be exported for easy integration with reference management systems.

You can also bookmark articles for later research. There are extensive networking options, including your Science Open profile, a forum for interacting with other researchers, the ability to track your usage and citations, and an interactive bibliography. Users have the ability to review articles and provide their knowledge and insight within the community.

Collection: 74,560,631

Other Services: None

Advanced Search Options:  Content type, source, author, journal, discipline

3. Directory of Open Access Journals

A multidisciplinary, community-curated directory, the Directory of Open Access Journals (DOAJ) gives researchers access to high-quality peer-reviewed journals. It has archived more than two million articles from 17,193 journals, allowing you to either browse by subject or search by keyword.

The site was launched in 2003 with the aim of increasing the visibility of OA scholarly journals online. Content on the site covers subjects from science, to law, to fine arts, and everything in between. DOAJ has a commitment to "increase the visibility, accessibility, reputation, usage and impact of quality, peer-reviewed, OA scholarly research journals globally, regardless of discipline, geography or language."

Information about the journal is available with each search result. Abstracts are also available in a collapsible format directly from the search screen. The scholarly article website is somewhat simple, but it is easy to navigate. There are 16 principles of transparency and best practices in scholarly publishing that clearly outline DOAJ policies and standards.

Collection: 6,817,242

Advanced Search Options:  Subject, journal, year

4. Education Resources Information Center

The Education Resources Information Center (ERIC) of the Institution of Education Sciences allows you to search by topic for material related to the field of education. Links lead to other sites, where you may have to purchase the information, but you can search for full-text articles only. You can also search only peer-reviewed sources.

The service primarily indexes journals, gray literature (such as technical reports, white papers, and government documents), and books. All sources of material on ERIC go through a formal review process prior to being indexed. ERIC's selection policy is available as a PDF on their website.

The ERIC website has an extensive FAQ section to address user questions. This includes categories like general questions, peer review, and ERIC content. There are also tips for advanced searches, as well as general guidance on the best way to search the database. ERIC is an excellent database for content specific to education.

Collection: 1,292,897

Advanced Search Options: Boolean

5. arXiv e-Print Archive

The arXiv e-Print Archive is run by Cornell University Library and curated by volunteer moderators, and it now offers OA to more than one million e-prints.

There are advisory committees for all eight subjects available on the database. With a stated commitment to an "emphasis on openness, collaboration, and scholarship," the arXiv e-Print Archive is an excellent STEM resource.

The interface is not as user-friendly as some of the other databases available, and the website hosts a blog to provide news and updates, but it is otherwise a straightforward math and science resource. There are simple and advanced search options, and, in addition to conducting searches for specific topics and articles, users can browse content by subject. The arXiv e-Print Archive clearly states that they do not peer review the e-prints in the database.

Collection: 1,983,891

Good Source of Peer-Reviewed Articles: No

Advanced Search Options:  Subject, date, title, author, abstract, DOI

6. Social Science Research Network

The Social Science Research Network (SSRN) is a collection of papers from the social sciences community. It is a highly interdisciplinary platform used to search for scholarly articles related to 67 social science topics. SSRN has a variety of research networks for the various topics available through the free scholarly database.

The site offers more than 700,000 abstracts and more than 600,000 full-text papers. There is not yet a specific option to search for only full-text articles, but, because most of the papers on the site are free access, it's not often that you encounter a paywall. There is currently no option to search for only peer-reviewed articles.

You must become a member to use the services, but registration is free and enables you to interact with other scholars around the world. SSRN is "passionately committed to increasing inclusion, diversity and equity in scholarly research," and they encourage and discuss the use of inclusive language in scholarship whenever possible.

Collection: 1,058,739 abstracts; 915,452 articles

Advanced Search Options: Term, author, date, network

7. Public Library of Science

Public Library of Science (PLOS) is a big player in the world of OA science. Publishing 12 OA journals, the nonprofit organization is committed to facilitating openness in academic research. According to the site, "all PLOS content is at the highest possible level of OA, meaning that scientific articles are immediately and freely available to anyone, anywhere."

PLOS outlines four fundamental goals that guide the organization: break boundaries, empower researchers, redefine quality, and open science. All PLOS journals are peer-reviewed, and all 12 journals uphold rigorous ethical standards for research, publication, and scientific reporting.

PLOS does not offer advanced search options. Content is organized by topic into research communities that users can browse through, in addition to options to search for both articles and journals. The PLOS website also has resources for peer reviewers, including guidance on becoming a reviewer and on how to best participate in the peer review process.

Collection: 12 journals

Advanced Search Options: None

8. OpenDOAR

OpenDOAR, or the Directory of Open Access Repositories, is a comprehensive resource for finding free OA journals and articles. Using Google Custom Search, OpenDOAR combs through OA repositories around the world and returns relevant research in all disciplines.

The repositories it searches through are assessed and categorized by OpenDOAR staff to ensure they meet quality standards. Inclusion criteria for the database include requirements for OA content, global access, and categorically appropriate content, in addition to various other quality assurance measures. OpenDOAR has metadata, data, content, preservation, and submission policies for repositories, in addition to two OA policy statements regarding minimum and optimum recommendations.

This database allows users to browse and search repositories, which can then be selected, and articles and data can be accessed from the repository directly. As a repository database, much of the content on the site is geared toward the support of repositories and OA standards.

Collection: 5,768 repositories

Other Services: OpenDOAR offers a variety of additional services. Given the nature of the platform, services are primarily aimed at repositories and institutions, and there is a marked focus on OA in general. Sherpa services are OA archiving tools for authors and institutions.

They also offer various resources for OA support and compliance regarding standards and policies. The publication router matches publications and publishers with appropriate repositories.

There are also services and resources from JISC for repositories for cost management, discoverability, research impact, and interoperability, including ORCID consortium membership information. Additionally, a repository self-assessment tool is available for members.

Advanced Search Options:  Name, organization name, repository type, software name, content type, subject, country, region

9. Bielefeld Academic Search Engine

The Bielefeld Academic Search Engine (BASE) is operated by the Bielefeld University Library in Germany, and it offers more than 240 million documents from more than 8,000 sources. Sixty percent of its content is OA, and you can filter your search accordingly.

BASE has rigorous inclusion requirements for content providers regarding quality and relevance, and they maintain a list of content providers for the sake of transparency, which can be easily found on their website. BASE has a fairly elegant interface. Search results can be organized by author, title, or date.

From the search results, items can be selected and exported, added to favorites, emailed, and searched in Google Scholar. There are basic and advanced search features, with the advanced search offering numerous options for refining search criteria. There is also a feature on the website that saves recent searches without additional steps from the user.

Collection: 276,019,066 documents; 9,286 content providers

Advanced Search Options:  Author, subject, year, content provider, language, document type, access, terms of reuse

Research Databases

10. Digital Library of the Commons Repository

Run by Indiana University, the Digital Library of the Commons (DLC) Repository is a multidisciplinary journal repository that allows users to access thousands of free and OA articles from around the world. You can browse by document type, date, author, title, and more or search for keywords relevant to your topic.

DCL also offers the Comprehensive Bibliography of the Commons, an image database, and a keyword thesaurus for enhanced search parameters. The repository includes books, book chapters, conference papers, journal articles, surveys, theses and dissertations, and working papers. DCL advanced search features drop-down menus of search types with built-in Boolean search options.

Searches can be sorted by relevance, title, date, or submission date in ascending or descending order. Abstracts are included in selected search results, with access to full texts available, and citations can be exported from the same page. Additionally, the image database search includes tips for better search results.

Collection: 10,784

Advanced Search Options:  Author, date, title, subject, sector, region, conference

11. CIA World Factbook

The CIA World Factbook is a little different from the other resources on this list in that it is not an online journal directory or repository. It is, however, a useful free online research database for academics in a variety of disciplines.

All the information is free to access, and it provides facts about every country in the world, which are organized by category and include information about history, geography, transportation, and much more. The World Factbook can be searched by country or region, and there is also information about the world’s oceans.

This site contains resources related to the CIA as an organization rather than being a scientific journal database specifically. The site has a user interface that is easy to navigate. The site also provides a section for updates regarding changes to what information is available and how it is organized, making it easier to interact with the information you are searching for.

Collection: 266 countries

12. Paperity

Paperity boasts its status as the "first multidisciplinary aggregator of OA journals and papers." Their focus is on helping you avoid paywalls while connecting you to authoritative research. In addition to providing readers with easy access to thousands of journals, Paperity seeks to help authors reach their audiences and help journals increase their exposure to boost readership.

Paperity has journal articles for every discipline, and the database offers more than a dozen advanced search options, including the length of the paper and the number of authors. There is even an option to include, exclude, or exclusively search gray papers.

Paperity is available for mobile, with both a mobile site and the Paperity Reader, an app that is available for both Android and Apple users. The database is also available on social media. You can interact with Paperity via Twitter and Facebook, and links to their social media are available on their homepage, including their Twitter feed.

Collection: 8,837,396

Advanced Search Options: Title, abstract, journal title, journal ISSN, publisher, year of publication, number of characters, number of authors, DOI, author, affiliation, language, country, region, continent, gray papers

13. dblp Computer Science Bibliography

The dblp Computer Science Bibliography is an online index of major computer science publications. dblp was founded in 1993, though until 2010 it was a university-specific database at the University of Trier in Germany. It is currently maintained by the Schloss Dagstuhl – Leibniz Center for Informatics.

Although it provides access to both OA articles and those behind a paywall, you can limit your search to only OA articles. The site indexes more than three million publications, making it an invaluable resource in the world of computer science. dblp entries are color-coded based on the type of item.

dblp has an extensive FAQ section, so questions that might arise about topics like the database itself, navigating the website, or the data on dblp, in addition to several other topics, are likely to be answered. The website also hosts a blog and has a section devoted to website statistics.

Collection: 5,884,702

14. EconBiz

EconBiz is a great resource for economic and business studies. A service of the Leibniz Information Centre for Economics, it offers access to full texts online, with the option of searching for OA material only. Their literature search is performed across multiple international databases.

EconBiz has an incredibly useful research skills section, with resources such as Guided Walk, a service to help students and researchers navigate searches, evaluate sources, and correctly cite references; the Research Guide EconDesk, a help desk to answer specific questions and provide advice to aid in literature searches; and the Academic Career Kit for what they refer to as Early Career Researchers.

Other helpful resources include personal literature lists, a calendar of events for relevant calls for papers, conferences, and workshops, and an economics terminology thesaurus to help in finding keywords for searches. To stay up-to-date with EconBiz, you can sign up for their newsletter.

Collection: 1,075,219

Advanced Search Options:  Title, subject, author, institution, ISBN/ISSN, journal, publisher, language, OA only

15. BioMed Central

BioMed Central provides OA research from more than 300 peer-reviewed journals. While originally focused on resources related to the physical sciences, math, and engineering, BioMed Central has branched out to include journals that cover a broader range of disciplines, with the aim of providing a single platform that provides OA articles for a variety of research needs. You can browse these journals by subject or title, or you can search all articles for your required keyword.

BioMed Central has a commitment to peer-reviewed sources and to the peer review process itself, continually seeking to help and improve the peer review process. They're "committed to maintaining high standards through full and stringent peer review." They publish the journal Research Integrity and Peer Review , which publishes research on the subject.

Additionally, the website includes resources to assist and support editors as part of their commitment to providing high-quality, peer-reviewed OA articles.

Collection: 507,212

Other Services: BMC administers the International Standard Randomised Controlled Trial Number (ISRCTN) registry. While initially designed for registering clinical trials, since its creation in 2000, the registry has broadened its scope to include other health studies as well.

The registry is recognized by the International Committee of Medical Journal Editors, as well as the World Health Organization (WHO), and it meets the requirements established by the WHO International Clinical Trials Registry Platform.

The study records included in the registry are all searchable and free to access. The ISRCTN registry "supports transparency in clinical research, helps reduce selective reporting of results and ensures an unbiased and complete evidence base."

Advanced Search Options:  Author, title, journal, list

A multidisciplinary search engine, JURN provides links to various scholarly websites, articles, and journals that are free to access or OA. Covering the fields of the arts, humanities, business, law, nature, science, and medicine, JURN has indexed almost 5,000 repositories to help you find exactly what you're looking for.

Search features are enhanced by Google, but searches are filtered through their index of repositories. JURN seeks to reach a wide audience, with their search engine tailored to researchers from "university lecturers and students seeking a strong search tool for OA content" and "advanced and ambitious students, age 14-18" to "amateur historians and biographers" and "unemployed and retired lecturers."

That being said, JURN is very upfront about its limitations. They admit to not being a good resource for educational studies, social studies, or psychology, and conference archives are generally not included due to frequently unstable URLs.

Collection: 5,064 indexed journals

Other Services: JURN has a browser add-on called UserScript. This add-on allows users to integrate the JURN database directly into Google Search. When performing a search through Google, the add-on creates a link that sends the search directly to JURN CSE. JURN CSE is a search service that is hosted by Google.

Clicking the link from the Google Search bar will run your search through the JURN database from the Google homepage. There is also an interface for a DuckDuckGo search box; while this search engine has an emphasis on user privacy, for smaller sites that may be indexed by JURN, DuckDuckGo may not provide the same depth of results.

Advanced Search Options:  Google search modifiers

Dryad is a digital repository of curated, OA scientific research data. Launched in 2009, it is run by a not-for-profit membership organization, with a community of institutional and publisher members for whom their services have been designed. Members include institutions such as Stanford, UCLA, and Yale, as well as publishers like Oxford University Press and Wiley.

Dryad aims to "promote a world where research data is openly available, integrated with the scholarly literature, and routinely reused to create knowledge." It is free to access for the search and discovery of data. Their user experience is geared toward easy self-depositing, supports Creative Commons licensing, and provides DOIs for all their content.

Note that there is a publishing charge associated if you wish to publish your data in Dryad. When searching datasets, they are accompanied by author information and abstracts for the associated studies, and citation information is provided for easy attribution.

Collection: 44,458

Advanced Search Options: No

Run by the British Library, the E-Theses Online Service (EThOS) allows you to search over 500,000 doctoral theses in a variety of disciplines. All of the doctoral theses available on EThOS have been awarded by higher education institutions in the United Kingdom.

Although some full texts are behind paywalls, you can limit your search to items available for immediate download, either directly through EThOS or through an institution's website. More than half of the records in the database provide access to full-text theses.

EThOS notes that they do not hold all records for all institutions, but they strive to index as many doctoral theses as possible, and the database is constantly expanding, with approximately 3,000 new records added and 2,000 new full-text theses available every month. The availability of full-text theses is dependent on multiple factors, including their availability in the institutional repository and the level of repository development.

Collection: 500,000+

Advanced Search Options:  Abstract, author's first name, author's last name, awarding body, current institution, EThOS ID, year, language, qualifications, research supervisor, sponsor/funder, keyword, title

PubMed is a research platform well-known in the fields of science and medicine. It was created and developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM). It has been available since 1996 and offers access to "more than 33 million citations for biomedical literature from MEDLINE, life science journals, and online books."

While PubMed does not provide full-text articles directly, and many full-text articles may be behind paywalls or require subscriptions to access them, when articles are available from free sources, such as through PubMed Central (PMC), those links are provided with the citations and abstracts that PubMed does provide.

PMC, which was established in 2000 by the NLM, is a free full-text archive that includes more than 6,000,000 records. PubMed records link directly to corresponding PMC results. PMC content is provided by publishers and other content owners, digitization projects, and authors directly.

Collection: 33,000,000+

Advanced Search Options: Author's first name, author's last name, identifier, corporation, date completed, date created, date entered, date modified, date published, MeSH, book, conflict of interest statement, EC/RN number, editor, filter, grant number, page number, pharmacological action, volume, publication type, publisher, secondary source ID, text, title, abstract, transliterated title

20. Semantic Scholar

A unique and easy-to-use resource, Semantic Scholar defines itself not just as a research database but also as a "search and discovery tool." Semantic Scholar harnesses the power of artificial intelligence to efficiently sort through millions of science-related papers based on your search terms.

Through this singular application of machine learning, Semantic Scholar expands search results to include topic overviews based on your search terms, with the option to create an alert for or further explore the topic. It also provides links to related topics.

In addition, search results produce "TLDR" summaries in order to provide concise overviews of articles and enhance your research by helping you to navigate quickly and easily through the available literature to find the most relevant information. According to the site, although some articles are behind paywalls, "the data [they] have for those articles is limited," so you can expect to receive mostly full-text results.

Collection: 203,379,033

Other Services: Semantic Scholar supports multiple popular browsers. Content can be accessed through both mobile and desktop versions of Firefox, Microsoft Edge, Google Chrome, Apple Safari, and Opera.

Additionally, Semantic Scholar provides browser extensions for both Chrome and Firefox, so AI-powered scholarly search results are never more than a click away. The mobile interface includes an option for Semantic Swipe, a new way of interacting with your research results.

There are also beta features that can be accessed as part of the Beta Program, which will provide you with features that are being actively developed and require user feedback for further improvement.

Advanced Search Options: Field of study, date range, publication type, author, journal, conference, PDF

Zenodo, powered by the European Organization for Nuclear Research (CERN), was launched in 2013. Taking its name from Zenodotus, the first librarian of the ancient library of Alexandria, Zenodo is a tool "built and developed by researchers, to ensure that everyone can join in open science." Zenodo accepts all research from every discipline in any file format.

However, Zenodo also curates uploads and promotes peer-reviewed material that is available through OA. A DOI is assigned to everything that is uploaded to Zenodo, making research easily findable and citable. You can sort by keyword, title, journal, and more and download OA documents directly from the site.

While there are closed access and restricted access items in the database, the vast majority of research is OA material. Search results can be filtered by access type, making it easy to view the free articles available in the database.

Collection: 2,220,000+

Advanced Search Options:  Access, file type, keywords

Check out our roundup of free research databases as a handy one-page PDF.

How to find peer-reviewed articles.

There are a lot of free scholarly articles available from various sources. The internet is a big place. So how do you go about finding peer-reviewed articles when conducting your research? It's important to make sure you are using reputable sources.

The first source of the article is the person or people who wrote it. Checking out the author can give you some initial insight into how much you can trust what you’re reading. Looking into the publication information of your sources can also indicate whether the article is reliable.

Aspects of the article, such as subject and audience, tone, and format, are other things you can look at when evaluating whether the article you're using is valid, reputable, peer-reviewed material. So, let's break that down into various components so you can assess your research to ensure that you're using quality articles and conducting solid research.

Check the Author

Peer-reviewed articles are written by experts or scholars with experience in the field or discipline they're writing about. The research in a peer-reviewed article has to pass a rigorous evaluation process, so it’s a foregone conclusion that the author(s) of a peer-reviewed article should have experience or training related to that research.

When evaluating an article, take a look at the author’s information. What credentials does the author have to indicate that their research has scholarly weight behind it? Finding out what type of degree the author has—and what that degree is in—can provide insight into what kind of authority the author is on the subject.

Something else that might lend credence to the author’s scholarly role is their professional affiliation. A look at what organization or institution they are affiliated with can tell you a lot about their experience or expertise. Where were they trained, and who is verifying their research?

Identify Subject and Audience

The ultimate goal of a study is to answer a question. Scholarly articles are also written for scholarly audiences, especially articles that have gone through the peer review process. This means that the author is trying to reach experts, researchers, academics, and students in the field or topic the research is based on.

Think about the question the author is trying to answer by conducting this research, why, and for whom. What is the subject of the article? What question has it set out to answer? What is the purpose of finding the information? Is the purpose of the article of importance to other scholars? Is it original content?

Research should also be approached analytically. Is the methodology sound? Is the author using an analytical approach to evaluate the data that they have obtained? Are the conclusions they've reached substantiated by their data and analysis? Answering these questions can reveal a lot about the article’s validity.

Format Matters

Reliable articles from peer-reviewed sources have certain format elements to be aware of. The first is an abstract. An abstract is a short summary or overview of the article. Does the article have an abstract? It's unlikely that you're reading a peer-reviewed article if it doesn’t. Peer-reviewed journals will also have a word count range. If an article seems far too short or incredibly long, that may be reason to doubt it.

Another feature of reliable articles is the sections the information is divided into. Peer-reviewed research articles will have clear, concise sections that appropriately organize the information. This might include a literature review, methodology, and results in the case of research articles and a conclusion.

One of the most important sections is the references or bibliography. This is where the researcher lists all the sources of their information. A peer-reviewed source will have a comprehensive reference section.

An article that has been written to reach an academic community will have an academic tone. The language that is used, and the way this language is used, is important to consider. If the article is riddled with grammatical errors, confusing syntax, and casual language, it almost definitely didn't make it through the peer review process.

Also consider the use of terminology. Every discipline is going to have standard terminology or jargon that can be used and understood by other academics in the discipline. The language in a peer-reviewed article is going to reflect that.

If the author is going out of their way to explain simple terms, or terms that are standard to the field or discipline, it's unlikely that the article has been peer reviewed, as this is something that the author would be asked to address during the review process.

Publication

The source of the article will be a very good indicator of the likelihood that it was peer reviewed. Where was the article published? Was it published alongside other academic articles in the same discipline? Is it a legitimate and reputable scholarly publication?

A trade publication or newspaper might be legitimate or reputable, but it is not a scholarly source, and it will not have been subject to the peer review process. Scholarly journals are the best resource for peer-reviewed articles, but it's important to remember that not all scholarly journals are peer reviewed.

It’s helpful to look at a scholarly source’s website, as peer-reviewed journals will have a clear indication of the peer review process. University libraries, institutional repositories, and reliable databases (and you now might have a list of some legit ones) can also help provide insight into whether an article comes from a peer-reviewed journal.

Free Online Journal

Common Research Mistakes to Avoid

Research is a lot of work. Even with high standards and good intentions, it’s easy to make mistakes. Perhaps you searched for access to scientific journals for free and found the perfect peer-reviewed sources, but you forgot to document everything, and your references are a mess. Or, you only searched for free online articles and missed out on a ground-breaking study that was behind a paywall.

Whether your research is for a degree or to get published or to satisfy your own inquisitive nature, or all of the above, you want all that work to produce quality results. You want your research to be thorough and accurate.

To have any hope of contributing to the literature on your research topic, your results need to be high quality. You might not be able to avoid every potential mistake, but here are some that are both common and easy to avoid.

Sticking to One Source

One of the hallmarks of good research is a healthy reference section. Using a variety of sources gives you a better answer to your question. Even if all of the literature is in agreement, looking at various aspects of the topic may provide you with an entirely different picture than you would have if you looked at your research question from only one angle.

Not Documenting Every Fact

As you conduct your research, do yourself a favor and write everything down. Everything you include in your paper or article that you got from another source is going to need to be added to your references and cited.

It's important, especially if your aim is to conduct ethical, high-quality research, that all of your research has proper attribution. If you don't document as you go, you could end up making a lot of work for yourself if the information you don’t write down is something that later, as you write your paper, you really need.

Using Outdated Materials

Academia is an ever-changing landscape. What was true in your academic discipline or area of research ten years ago may have since been disproven. If fifteen studies have come out since the article that you're using was published, it's more than a little likely that you're going to be basing your research on flawed or dated information.

If the information you're basing your research on isn’t as up-to-date as possible, your research won't be of quality or able to stand up to any amount of scrutiny. You don’t want all of your hard work to be for naught.

Relying Solely on Open Access Journals

OA is a great resource for conducting academic research. There are high-quality journal articles available through OA, and that can be very helpful for your research. But, just because you have access to free articles, that doesn't mean that there's nothing to be found behind a paywall.

Just as dismissing high-quality peer-reviewed articles because they are OA would be limiting, not exploring any paid content at all is equally short-sighted. If you're seeking to conduct thorough and comprehensive research, exploring all of your options for quality sources is going to be to your benefit.

Digging Too Deep or Not Deep Enough

Research is an art form, and it involves a delicate balance of information. If you conduct your research using only broad search terms, you won't be able to answer your research question well, or you'll find that your research provides information that is closely related to your topic but, ultimately, your findings are vague and unsubstantiated.

On the other hand, if you delve deeply into your research topic with specific searches and turn up too many sources, you might have a lot of information that is adjacent to your topic but without focus and perhaps not entirely relevant. It's important to answer your research question concisely but thoroughly.

Different Types of Scholarly Articles

Different types of scholarly articles have different purposes. An original research article, also called an empirical article, is the product of a study or an experiment. This type of article seeks to answer a question or fill a gap in the existing literature.

Research articles will have a methodology, results, and a discussion of the findings of the experiment or research and typically a conclusion.

Review articles overview the current literature and research and provide a summary of what the existing research indicates or has concluded. This type of study will have a section for the literature review, as well as a discussion of the findings of that review. Review articles will have a particularly extensive reference or bibliography section.

Theoretical articles draw on existing literature to create new theories or conclusions, or look at current theories from a different perspective, to contribute to the foundational knowledge of the field of study.

10 Tips for Navigating Journal Databases

Use the right academic journal database for your search, be that interdisciplinary or specific to your field. Or both!

If it’s an option, set the search results to return only peer-reviewed sources.

Start by using search terms that are relevant to your topic without being overly specific.

Try synonyms, especially if your keywords aren’t returning the desired results.

Scholarly Journal Articles

Even if you’ve found some good articles, try searching using different terms.

Explore the advanced search features of the database(s).

Learn to use Booleans (AND, OR, NOT) to expand or narrow your results.

Once you’ve gotten some good results from a more general search, try narrowing your search.

Read through abstracts when trying to find articles relevant to your research.

Keep track of your research and use citation tools. It’ll make life easier when it comes time to compile your references.

7 Frequently Asked Questions

1. how do i get articles for free.

Free articles can be found through free online academic journals, OA databases, or other databases that include OA journals and articles. These resources allow you to access free papers online so you can conduct your research without getting stuck behind a paywall.

Academics don’t receive payment for the articles they contribute to journals. There are often, in fact, publication fees that scholars pay in order to publish. This is one of the funding structures that allows OA journals to provide free content so that you don’t have to pay fees or subscription costs to access journal articles.

2. How Do I Find Journal Articles?

Journal articles can be found in databases and institutional repositories that can be accessed at university libraries. However, online research databases that contain OA articles are the best resource for getting free access to journal articles that are available online.

Peer-reviewed journal articles are the best to use for academic research, and there are a number of databases where you can find peer-reviewed OA journal articles. Once you've found a useful article, you can look through the references for the articles the author used to conduct their research, and you can then search online databases for those articles, too.

3. How Do I Find Peer-Reviewed Articles?

Peer-reviewed articles can be found in reputable scholarly peer-reviewed journals. High-quality journals and journal articles can be found online using academic search engines and free research databases. These resources are excellent for finding OA articles, including peer-reviewed articles.

OA articles are articles that can be accessed for free. While some scholarly search engines and databases include articles that aren't peer reviewed, there are also some that provide only peer-reviewed articles, and databases that include non-peer-reviewed articles often have advanced search features that enable you to select “peer review only.” The database will return results that are exclusively peer-reviewed content.

4. What Are Research Databases?

A research database is a list of journals, articles, datasets, and/or abstracts that allows you to easily search for scholarly and academic resources and conduct research online. There are databases that are interdisciplinary and cover a variety of topics.

For example, Paperity might be a great resource for a chemist as well as a linguist, and there are databases that are more specific to a certain field. So, while ERIC might be one of the best educational databases available for OA content, it's not going to be one of the best databases for finding research in the field of microbiology.

5. How Do I Find Scholarly Articles for Specific Fields?

There are interdisciplinary research databases that provide articles in a variety of fields, as well as research databases that provide articles that cater to specific disciplines. Additionally, a journal repository or index can be a helpful resource for finding articles in a specific field.

When searching an interdisciplinary database, there are frequently advanced search features that allow you to narrow the search results down so that they are specific to your field. Selecting “psychology” in the advanced search features will return psychology journal articles in your search results. You can also try databases that are specific to your field.

If you're searching for law journal articles, many law reviews are OA. If you don’t know of any databases specific to history, visiting a journal repository or index and searching “history academic journals” can return a list of journals specific to history and provide you with a place to begin your research.

6. Are Peer-Reviewed Articles Really More Legitimate?

The short answer is yes, peer-reviewed articles are more legitimate resources for academic research. The peer review process provides legitimacy, as it is a rigorous review of the content of an article that is performed by scholars and academics who are experts in their field of study. The review provides an evaluation of the quality and credibility of the article.

Non-peer-reviewed articles are not subject to a review process and do not undergo the same level of scrutiny. This means that non-peer-reviewed articles are unlikely, or at least not as likely, to meet the same standards that peer-reviewed articles do.

7. Are Free Article Directories Legitimate?

Yes! As with anything, some databases are going to be better for certain requirements than others. But, a scholarly article database being free is not a reason in itself to question its legitimacy.

Free scholarly article databases can provide access to abstracts, scholarly article websites, journal repositories, and high-quality peer-reviewed journal articles. The internet has a lot of information, and it's often challenging to figure out what information is reliable. 

Research databases and article directories are great resources to help you conduct your research. Our list of the best research paper websites is sure to provide you with sources that are totally legit.

Get Professional Academic Editing

Hire an expert academic editor , or get a free sample, about the author.

Scribendi Editing and Proofreading

Scribendi's in-house editors work with writers from all over the globe to perfect their writing. They know that no piece of writing is complete without a professional edit, and they love to see a good piece of writing transformed into a great one. Scribendi's in-house editors are unrivaled in both experience and education, having collectively edited millions of words and obtained numerous degrees. They love consuming caffeinated beverages, reading books of various genres, and relaxing in quiet, dimly lit spaces.

Have You Read?

"The Complete Beginner's Guide to Academic Writing"

Related Posts

How to Write a Research Proposal

How to Write a Research Proposal

How to Write a Scientific Paper

How to Write a Scientific Paper

How to Write a Thesis or Dissertation

How to Write a Thesis or Dissertation

Upload your file(s) so we can calculate your word count, or enter your word count manually.

We will also recommend a service based on the file(s) you upload.

English is not my first language. I need English editing and proofreading so that I sound like a native speaker.

I need to have my journal article, dissertation, or term paper edited and proofread, or I need help with an admissions essay or proposal.

I have a novel, manuscript, play, or ebook. I need editing, copy editing, proofreading, a critique of my work, or a query package.

I need editing and proofreading for my white papers, reports, manuals, press releases, marketing materials, and other business documents.

I need to have my essay, project, assignment, or term paper edited and proofread.

I want to sound professional and to get hired. I have a resume, letter, email, or personal document that I need to have edited and proofread.

 Prices include your personal % discount.

 Prices include % sales tax ( ).

best research paper database

  • Link to facebook
  • Link to linkedin
  • Link to twitter
  • Link to youtube
  • Writing Tips

10 Free Research and Journal Databases

10 Free Research and Journal Databases

3-minute read

  • 6th April 2019

Finding good research can be tough, especially when so much of it is locked behind paywalls . But there are free resources out there if you know where to look. So to help out, we’ve compiled a list of ten free academic search engines and databases that you should check out.

1. Google Scholar

Even if you’ve not used Google Scholar before, you’ll know Google. And, thus, you can probably guess that Google Scholar is a search engine dedicated to academic work. Not everything listed on Google Scholar will be freely available in full. But it is a good place to start if you’re looking for a specific paper, and many papers can be downloaded for free.

CORE is an open research aggregator. This means it works as a search engine for open access research published by organizations from around the world, all of which is available for free. It is also the world’s largest open access aggregator , so it is a very useful resource for researchers!

Core logo.

3. Bielefeld Academic Search Engine (BASE)

Another dedicated academic search engine, BASE offers access to more than 140 million documents from more than 6,000 sources. Around 60% of these documents are open access, and you can filter results to see only research that is available for free online.

4. Directory of Open Access Journals (DOAJ)

The Directory of Open Access Journals (DOAJ) is a database that lists around 12,000 open access journals covering all areas of science, technology, medicine, social science, and the humanities.

PubMed is a search engine maintained by the NCBI, part of the United States National Library of Medicine. It provides access to more than 29 million citations of biomedical research from MEDLINE, life science journals, and online books. The NCBI runs a similar search engine for research in the chemical sciences called PubChem , too, which is also free to use.

Find this useful?

Subscribe to our newsletter and get writing tips from our editors straight to your inbox.

6. E-Theses Online Service (EThOS)

Run by the British Library, EThOS is a database of over 500,000 doctoral theses. More than half of these are available for free, either directly via EThOS or via a link to a university website.

7. Social Science Research Network (SSRN)

SSRN is a database for research from the social sciences and humanities, including 846,589 research papers from 426,107 researchers across 30 disciplines. Most of these are available for free, although you may need to sign up as a member (also free) to access some services.

8. WorldWideScience

WorldWideScience is a global academic search engine, providing access to national and international scientific databases from across the globe. One interesting feature is that it offers automatic translation, so users can have search results translated into their preferred language.

WorldWideScience logo.

9. Semantic Scholar

Semantic Scholar is an “intelligent” academic search engine. It uses machine learning to prioritize the most important research, which can make it easier to find relevant literature. Or, in Semantic Scholar’s own words, it uses influential citations, images, and key phrases to “cut through the clutter.”

10. Public Library of Science (PLOS)

PLOS is an open-access research organization that publishes several journals. But as well as publishing its own research, PLOS is a dedicated advocate for open-access learning. So if you appreciate the search engines and databases we’ve listed here, check out the rest of the PLOS site to find out more about their campaign to enable access to knowledge.

Share this article:

' src=

Post A New Comment

Got content that needs a quick turnaround? Let us polish your work. Explore our editorial business services.

How to come up with newsletter ideas.

If used strategically, email can have a substantial impact on your business. In fact, according...

4-minute read

Free Online Peer Review Template

Having your writing peer-reviewed is a valuable process that can showcase the strengths and weaknesses...

How to Embed a Video in PowerPoint

Including a video in your PowerPoint presentation can make it more exciting and engaging. And...

What Is a Patent?

A patent is a form of intellectual property that restricts who can copy your invention....

How to Add Speaker Notes in PowerPoint

Adding speaker notes to your PowerPoint allows you to present with confidence while avoiding information...

How to Download a PowerPoint Presentation

PowerPoint is Microsoft’s presentation software. It’s frequently used by families, students, and businesses to create...

Logo Harvard University

Make sure your writing is the best it can be with our expert English proofreading and editing.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base
  • Working with sources

How to Find Sources | Scholarly Articles, Books, Etc.

Published on June 13, 2022 by Eoghan Ryan . Revised on May 31, 2023.

It’s important to know how to find relevant sources when writing a  research paper , literature review , or systematic review .

The types of sources you need will depend on the stage you are at in the research process , but all sources that you use should be credible , up to date, and relevant to your research topic.

There are three main places to look for sources to use in your research:

Research databases

  • Your institution’s library
  • Other online resources

Table of contents

Library resources, other online sources, other interesting articles, frequently asked questions about finding sources.

You can search for scholarly sources online using databases and search engines like Google Scholar . These provide a range of search functions that can help you to find the most relevant sources.

If you are searching for a specific article or book, include the title or the author’s name. Alternatively, if you’re just looking for sources related to your research problem , you can search using keywords. In this case, it’s important to have a clear understanding of the scope of your project and of the most relevant keywords.

Databases can be general (interdisciplinary) or subject-specific.

  • You can use subject-specific databases to ensure that the results are relevant to your field.
  • When using a general database or search engine, you can still filter results by selecting specific subjects or disciplines.

Example: JSTOR discipline search filter

Filtering by discipline

Check the table below to find a database that’s relevant to your research.

Google Scholar

To get started, you might also try Google Scholar , an academic search engine that can help you find relevant books and articles. Its “Cited by” function lets you see the number of times a source has been cited. This can tell you something about a source’s credibility and importance to the field.

Example: Google Scholar “Cited by” function

Google Scholar cited by function

Boolean operators

Boolean operators can also help to narrow or expand your search.

Boolean operators are words and symbols like AND , OR , and NOT that you can use to include or exclude keywords to refine your results. For example, a search for “Nietzsche NOT nihilism” will provide results that include the word “Nietzsche” but exclude results that contain the word “nihilism.”

Many databases and search engines have an advanced search function that allows you to refine results in a similar way without typing the Boolean operators manually.

Example: Project Muse advanced search

Project Muse advanced search

Scribbr Citation Checker New

The AI-powered Citation Checker helps you avoid common mistakes such as:

  • Missing commas and periods
  • Incorrect usage of “et al.”
  • Ampersands (&) in narrative citations
  • Missing reference entries

best research paper database

You can find helpful print sources in your institution’s library. These include:

  • Journal articles
  • Encyclopedias
  • Newspapers and magazines

Make sure that the sources you consult are appropriate to your research.

You can find these sources using your institution’s library database. This will allow you to explore the library’s catalog and to search relevant keywords. You can refine your results using Boolean operators .

Once you have found a relevant print source in the library:

  • Consider what books are beside it. This can be a great way to find related sources, especially when you’ve found a secondary or tertiary source instead of a primary source .
  • Consult the index and bibliography to find the bibliographic information of other relevant sources.

You can consult popular online sources to learn more about your topic. These include:

  • Crowdsourced encyclopedias like Wikipedia

You can find these sources using search engines. To refine your search, use Boolean operators in combination with relevant keywords.

However, exercise caution when using online sources. Consider what kinds of sources are appropriate for your research and make sure the sites are credible .

Look for sites with trusted domain extensions:

  • URLs that end with .edu are educational resources.
  • URLs that end with .gov are government-related resources.
  • DOIs often indicate that an article is published in a peer-reviewed , scientific article.

Other sites can still be used, but you should evaluate them carefully and consider alternatives.

If you want to know more about ChatGPT, AI tools , citation , and plagiarism , make sure to check out some of our other articles with explanations and examples.

  • ChatGPT vs human editor
  • ChatGPT citations
  • Is ChatGPT trustworthy?
  • Using ChatGPT for your studies
  • What is ChatGPT?
  • Chicago style
  • Paraphrasing

 Plagiarism

  • Types of plagiarism
  • Self-plagiarism
  • Avoiding plagiarism
  • Academic integrity
  • Consequences of plagiarism
  • Common knowledge

Prevent plagiarism. Run a free check.

You can find sources online using databases and search engines like Google Scholar . Use Boolean operators or advanced search functions to narrow or expand your search.

For print sources, you can use your institution’s library database. This will allow you to explore the library’s catalog and to search relevant keywords.

It is important to find credible sources and use those that you can be sure are sufficiently scholarly .

  • Consult your institute’s library to find out what books, journals, research databases, and other types of sources they provide access to.
  • Look for books published by respected academic publishing houses and university presses, as these are typically considered trustworthy sources.
  • Look for journals that use a peer review process. This means that experts in the field assess the quality and credibility of an article before it is published.

When searching for sources in databases, think of specific keywords that are relevant to your topic , and consider variations on them or synonyms that might be relevant.

Once you have a clear idea of your research parameters and key terms, choose a database that is relevant to your research (e.g., Medline, JSTOR, Project MUSE).

Find out if the database has a “subject search” option. This can help to refine your search. Use Boolean operators to combine your keywords, exclude specific search terms, and search exact phrases to find the most relevant sources.

There are many types of sources commonly used in research. These include:

You’ll likely use a variety of these sources throughout the research process , and the kinds of sources you use will depend on your research topic and goals.

Scholarly sources are written by experts in their field and are typically subjected to peer review . They are intended for a scholarly audience, include a full bibliography, and use scholarly or technical language. For these reasons, they are typically considered credible sources .

Popular sources like magazines and news articles are typically written by journalists. These types of sources usually don’t include a bibliography and are written for a popular, rather than academic, audience. They are not always reliable and may be written from a biased or uninformed perspective, but they can still be cited in some contexts.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Ryan, E. (2023, May 31). How to Find Sources | Scholarly Articles, Books, Etc.. Scribbr. Retrieved February 22, 2024, from https://www.scribbr.com/working-with-sources/finding-sources/

Is this article helpful?

Eoghan Ryan

Eoghan Ryan

Other students also liked, types of sources explained | examples & tips, primary vs. secondary sources | difference & examples, boolean operators | quick guide, examples & tips.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

PubMed® comprises more than 36 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.

Featured Bookshelf titles

best research paper database

Amyotrophic Lateral Sclerosis

Araki T, editor.

best research paper database

Drugs and Lactation Database (LactMed®)

Literature databases

Books and reports

Ontology used for PubMed indexing

Books, journals and more in the NLM Collections

Scientific and medical abstracts/citations

Full-text journal articles

Gene sequences and annotations used as references for the study of orthologs structure, expression, and evolution

Collected information about gene loci

Functional genomics studies

Gene expression and molecular abundance profiles

Homologous genes sets for selected organisms

Sequence sets from phylogenetic and population studies

Protein sequences, 3-D structures, and tools for the study of functional protein domains and active sites

Conserved protein domains

Protein sequences grouped by identity

Protein sequences

Models representing homologous proteins with a common function

Experimentally-determined biomolecular structures

A tool to find regions of similarity between biological sequences

Search nucleotide sequence databases

Search protein sequence databases

Search protein databases using a translated nucleotide query

Search translated nucleotide databases using a protein query

Find primers specific to your PCR template

Genome sequence assemblies, large-scale functional genomics data, and source biological samples

Genome assembly information

Museum, herbaria, and other biorepository collections

Biological projects providing data to NCBI

Descriptions of biological source materials

Genome sequencing projects by organism

DNA and RNA sequences

High-throughput sequence reads

Taxonomic classification and nomenclature

Heritable DNA variations, associations with human pathologies, and clinical diagnostics and treatments

Privately and publicly funded clinical studies conducted around the world

Human variations of clinical significance

Genotype/phenotype interaction studies

Short genetic variations

Genome structural variation studies

Genetic testing registry

Medical genetics literature and links

Online mendelian inheritance in man

Repository of chemical information, molecular pathways, and tools for bioactivity screening

Bioactivity screening studies

Chemical information with structures, information and links

Molecular pathways with links to genes, proteins and chemicals

Deposited substance and chemical information

Research news

275 million new genetic variants identified in nih precision medicine data.

Study details the unprecedented scale, diversity, and power of the All of Us Research Program.

Exploring the Past, Present, and Future of Brain Organoids 

Paola Arlotta seeks to understand the complex symphony of brain development in vitro by using organoid models.

COVID-19 vaccination and boosting during pregnancy protects infants for six months

Findings reinforce the importance of receiving both a COVID-19 vaccine and booster during pregnancy.

Recent blog posts

A potential new way to prevent noise-induced hearing loss: trapping excess zinc.

Hearing loss is a pervasive problem, affecting one in eight people aged 12 and up in the U.S. While hearing loss has multiple causes, an important one for millions of people is exposure to loud noises, which can lead to gradual hearing loss, or people can lose their hearing all at once. The only methods used to prevent noise-induced hearing loss today are avoiding loud noises altogether or wearing earplugs or other protective devices during loud activities. But findings from an intriguing new NIH-supported study exploring the underlying causes of this form of hearing loss suggest it may be possible to protect hearing in a different way: with treatments targeting excess and damaging levels of zinc in the inner ear.

Join NCBI at TAGC 2024

March 6-10 in Washington, D.C.  We look forward to seeing you in person at The Allied Genetics Conference (TAGC), March 6-10, 2024, in the Washington D.C. metro area. NCBI staff will participate in a variety of activities and events, including hosting a hands-on workshop: Exploring and downloading NCBI data with NCBI Datasets. We’re also excited … Continue reading Join NCBI at TAGC 2024 →

Unraveling the Complexity of Cellular Identity in Health and Disease

Better understanding a cell’s identity and function—its “phenotype”—is fundamental to many medical discoveries. Fortunately, new technology has made a deeper understanding of cellular phenotypes more accessible than ever. Learn how NLM experts and others are working to accelerate this field in a new blog from NLM Scientific Director Dr. Richard H. Scheuermann.

Database Search

What is Database Search?

Harvard Library licenses hundreds of online databases, giving you access to academic and news articles, books, journals, primary sources, streaming media, and much more.

The contents of these databases are only partially included in HOLLIS. To make sure you're really seeing everything, you need to search in multiple places. Use Database Search to identify and connect to the best databases for your topic.

In addition to digital content, you will find specialized search engines used in specific scholarly domains.

Related Services & Tools

Librarians/Admins

  • EBSCOhost Collection Manager
  • EBSCO Experience Manager
  • EBSCO Connect
  • Start your research
  • EBSCO Mobile App

Clinical Decisions Users

  • DynaMed Decisions
  • Dynamic Health
  • Waiting Rooms
  • NoveList Blog

Free Databases

EBSCO provides free research databases covering a variety of subjects for students, researchers and librarians.

Exploring Race in Society

This free research database offers essential content covering important issues related to race in society today. Essays, articles, reports and other reliable sources provide an in-depth look at the history of race and provide critical context for learning more about topics associated with race, ethnicity, diversity and inclusiveness.

EBSCO Open Dissertations

EBSCO Open Dissertations is a collaboration between EBSCO and BiblioLabs to increase traffic and discoverability of ETD research. You can join the movement and add your theses and dissertations to the database, making them freely available to researchers everywhere.

GreenFILE is a free research database covering all aspects of human impact to the environment. Its collection of scholarly, government and general-interest titles includes content on global warming, green building, pollution, sustainable agriculture, renewable energy, recycling, and more. 

Library, Information Science and Technology Abstracts

Library, Information Science & Technology Abstracts (LISTA) is a free research database for library and information science studies. LISTA provides indexing and abstracting for hundreds of key journals, books, research reports. It is EBSCO's intention to provide access to this resource on a continual basis.

Teacher Reference Center

A complimentary research database for teachers,  Teacher Reference Center  (TRC) provides indexing and abstracts for more than 230 peer-reviewed journals.

European Views of the Americas: 1493 to 1750

European Views of the Americas: 1493 to 1750 is a free archive of indexed publications related to the Americas and written in Europe before 1750. It includes thousands of valuable primary source records covering the history of European exploration as well as portrayals of Native American peoples.

Recommended Reading

Top 10 Support Resources for K-12 Schools on EBSCO Connect

best research paper database

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Download 47 million PDFs for free

Explore our top research interests.

best research paper database

Engineering

best research paper database

Anthropology

best research paper database

  • Earth Sciences

best research paper database

  • Computer Science

best research paper database

  • Mathematics

best research paper database

  • Health Sciences

best research paper database

Join 253 million academics and researchers

Accelerate your research, streamline your discovery of relevant research.

Get access to 47+ million research papers and stay informed with important topics through courses.

Grow Your Audience

Build your success and track your impact.

Share your work with other academics, grow your audience, and track your impact on your field with our robust analytics.

Unlock the most powerful tools with Academia Premium

best research paper database

Work faster and smarter with advanced research discovery tools

Search the full text and citations of our millions of papers. Download groups of related papers to jumpstart your research. Save time with detailed summaries and search alerts.

  • Advanced Search
  • PDF Packages of 37 papers
  • Summaries and Search Alerts

best research paper database

Share your work, track your impact, and grow your audience

Get notified when other academics mention you or cite your papers. Track your impact with in-depth analytics and network with members of your field.

  • Mentions and Citations Tracking
  • Advanced Analytics
  • Publishing Tools

Real stories from real people

best research paper database

Used by academics at over 16,000 universities

best research paper database

Get started and find the best quality research

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Cognitive Science
  • Academia ©2024

UNC Charlotte Homepage

  • Top Ten Databases

Atkins Library Quick Links

  • Library Homepage
  • Library Hours
  • Library Databases
  • Research Guides
  • Citing Your Sources
  • Reserve a Study Room
  • My Library Account
  • My Interlibrary Loan Account
  • Academic Search Complete This link opens in a new window A great database to get started with for your research on any topic. Use it to search for articles from scholarly (peer-reviewed) journals, newspapers, and magazines.
  • Business Source Complete This link opens in a new window Contains full-text content and peer-reviewed business journals covering all disciplines of business, including marketing, management, accounting, banking, and finance.
  • CINAHL Complete This link opens in a new window Provides citations and full text articles primarily for nursing and allied health professionals. Coverage from 1937 to present.
  • ERIC Database (via EBSCOhost) This link opens in a new window Provides a comprehensive bibliographic and full-text database of education research and information for educators, researchers, and the general public.
  • JSTOR This link opens in a new window Provides access to academic journals and books covering a wide range of disciplines; it also includes some limited primary source collections.
  • PsycINFO This link opens in a new window Provides peer-reviewed literature in behavioral science and mental health and is produced by the American Psychological Association.
  • PubMed This link opens in a new window Contains millions of citations for biomedical and health literature from MEDLINE and other sources. To access full text from Atkins Library's journal collections in addition to full-text content from PubMed Central and open access publications, use the links provided on Atkins Library web pages.
  • Web of Science This link opens in a new window This multidisciplinary database includes a citation mapping feature that allows you to track research across time, including almost 1.7 billion cited references allowing for comprehensive searches.
  • Communication and Mass Media Complete This link opens in a new window Offers full-text, indexing and abstracts for many top communication journals covering all related disciplines, including media studies, linguistics, rhetoric and discourse.
  • ABI/INFORM Collection This link opens in a new window Contains thousands of journals and offers full-text titles covering business and economic conditions, corporate strategies, management techniques, and competitive and product information.
  • Last Updated: Nov 16, 2023 1:25 PM
  • URL: https://guides.library.charlotte.edu/topten

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 19 February 2024

Genomic data in the All of Us Research Program

The all of us research program genomics investigators.

Nature ( 2024 ) Cite this article

62k Accesses

1 Citations

546 Altmetric

Metrics details

  • Genetic variation
  • Genome-wide association studies

Comprehensively mapping the genetic basis of human disease across diverse individuals is a long-standing goal for the field of human genetics 1 , 2 , 3 , 4 . The All of Us Research Program is a longitudinal cohort study aiming to enrol a diverse group of at least one million individuals across the USA to accelerate biomedical research and improve human health 5 , 6 . Here we describe the programme’s genomics data release of 245,388 clinical-grade genome sequences. This resource is unique in its diversity as 77% of participants are from communities that are historically under-represented in biomedical research and 46% are individuals from under-represented racial and ethnic minorities. All of Us identified more than 1 billion genetic variants, including more than 275 million previously unreported genetic variants, more than 3.9 million of which had coding consequences. Leveraging linkage between genomic data and the longitudinal electronic health record, we evaluated 3,724 genetic variants associated with 117 diseases and found high replication rates across both participants of European ancestry and participants of African ancestry. Summary-level data are publicly available, and individual-level data can be accessed by researchers through the All of Us Researcher Workbench using a unique data passport model with a median time from initial researcher registration to data access of 29 hours. We anticipate that this diverse dataset will advance the promise of genomic medicine for all.

Comprehensively identifying genetic variation and cataloguing its contribution to health and disease, in conjunction with environmental and lifestyle factors, is a central goal of human health research 1 , 2 . A key limitation in efforts to build this catalogue has been the historic under-representation of large subsets of individuals in biomedical research including individuals from diverse ancestries, individuals with disabilities and individuals from disadvantaged backgrounds 3 , 4 . The All of Us Research Program (All of Us) aims to address this gap by enrolling and collecting comprehensive health data on at least one million individuals who reflect the diversity across the USA 5 , 6 . An essential component of All of Us is the generation of whole-genome sequence (WGS) and genotyping data on one million participants. All of Us is committed to making this dataset broadly useful—not only by democratizing access to this dataset across the scientific community but also to return value to the participants themselves by returning individual DNA results, such as genetic ancestry, hereditary disease risk and pharmacogenetics according to clinical standards, to those who wish to receive these research results.

Here we describe the release of WGS data from 245,388 All of Us participants and demonstrate the impact of this high-quality data in genetic and health studies. We carried out a series of data harmonization and quality control (QC) procedures and conducted analyses characterizing the properties of the dataset including genetic ancestry and relatedness. We validated the data by replicating well-established genotype–phenotype associations including low-density lipoprotein cholesterol (LDL-C) and 117 additional diseases. These data are available through the All of Us Researcher Workbench, a cloud platform that embodies and enables programme priorities, facilitating equitable data and compute access while ensuring responsible conduct of research and protecting participant privacy through a passport data access model.

The All of Us Research Program

To accelerate health research, All of Us is committed to curating and releasing research data early and often 6 . Less than five years after national enrolment began in 2018, this fifth data release includes data from more than 413,000 All of Us participants. Summary data are made available through a public Data Browser, and individual-level participant data are made available to researchers through the Researcher Workbench (Fig. 1a and Data availability).

figure 1

a , The All of Us Research Hub contains a publicly accessible Data Browser for exploration of summary phenotypic and genomic data. The Researcher Workbench is a secure cloud-based environment of participant-level data in a Controlled Tier that is widely accessible to researchers. b , All of Us participants have rich phenotype data from a combination of physical measurements, survey responses, EHRs, wearables and genomic data. Dots indicate the presence of the specific data type for the given number of participants. c , Overall summary of participants under-represented in biomedical research (UBR) with data available in the Controlled Tier. The All of Us logo in a is reproduced with permission of the National Institutes of Health’s All of Us Research Program.

Participant data include a rich combination of phenotypic and genomic data (Fig. 1b ). Participants are asked to complete consent for research use of data, sharing of electronic health records (EHRs), donation of biospecimens (blood or saliva, and urine), in-person provision of physical measurements (height, weight and blood pressure) and surveys initially covering demographics, lifestyle and overall health 7 . Participants are also consented for recontact. EHR data, harmonized using the Observational Medical Outcomes Partnership Common Data Model 8 ( Methods ), are available for more than 287,000 participants (69.42%) from more than 50 health care provider organizations. The EHR dataset is longitudinal, with a quarter of participants having 10 years of EHR data (Extended Data Fig. 1 ). Data include 245,388 WGSs and genome-wide genotyping on 312,925 participants. Sequenced and genotyped individuals in this data release were not prioritized on the basis of any clinical or phenotypic feature. Notably, 99% of participants with WGS data also have survey data and physical measurements, and 84% also have EHR data. In this data release, 77% of individuals with genomic data identify with groups historically under-represented in biomedical research, including 46% who self-identify with a racial or ethnic minority group (Fig. 1c , Supplementary Table 1 and Supplementary Note ).

Scaling the All of Us infrastructure

The genomic dataset generated from All of Us participants is a resource for research and discovery and serves as the basis for return of individual health-related DNA results to participants. Consequently, the US Food and Drug Administration determined that All of Us met the criteria for a significant risk device study. As such, the entire All of Us genomics effort from sample acquisition to sequencing meets clinical laboratory standards 9 .

All of Us participants were recruited through a national network of partners, starting in 2018, as previously described 5 . Participants may enrol through All of Us - funded health care provider organizations or direct volunteer pathways and all biospecimens, including blood and saliva, are sent to the central All of Us Biobank for processing and storage. Genomics data for this release were generated from blood-derived DNA. The programme began return of actionable genomic results in December 2022. As of April 2023, approximately 51,000 individuals were sent notifications asking whether they wanted to view their results, and approximately half have accepted. Return continues on an ongoing basis.

The All of Us Data and Research Center maintains all participant information and biospecimen ID linkage to ensure that participant confidentiality and coded identifiers (participant and aliquot level) are used to track each sample through the All of Us genomics workflow. This workflow facilitates weekly automated aliquot and plating requests to the Biobank, supplies relevant metadata for the sample shipments to the Genome Centers, and contains a feedback loop to inform action on samples that fail QC at any stage. Further, the consent status of each participant is checked before sample shipment to confirm that they are still active. Although all participants with genomic data are consented for the same general research use category, the programme accommodates different preferences for the return of genomic data to participants and only data for those individuals who have consented for return of individual health-related DNA results are distributed to the All of Us Clinical Validation Labs for further evaluation and health-related clinical reporting. All participants in All of Us that choose to get health-related DNA results have the option to schedule a genetic counselling appointment to discuss their results. Individuals with positive findings who choose to obtain results are required to schedule an appointment with a genetic counsellor to receive those findings.

Genome sequencing

To satisfy the requirements for clinical accuracy, precision and consistency across DNA sample extraction and sequencing, the All of Us Genome Centers and Biobank harmonized laboratory protocols, established standard QC methodologies and metrics, and conducted a series of validation experiments using previously characterized clinical samples and commercially available reference standards 9 . Briefly, PCR-free barcoded WGS libraries were constructed with the Illumina Kapa HyperPrep kit. Libraries were pooled and sequenced on the Illumina NovaSeq 6000 instrument. After demultiplexing, initial QC analysis is performed with the Illumina DRAGEN pipeline (Supplementary Table 2 ) leveraging lane, library, flow cell, barcode and sample level metrics as well as assessing contamination, mapping quality and concordance to genotyping array data independently processed from a different aliquot of DNA. The Genome Centers use these metrics to determine whether each sample meets programme specifications and then submits sequencing data to the Data and Research Center for further QC, joint calling and distribution to the research community ( Methods ).

This effort to harmonize sequencing methods, multi-level QC and use of identical data processing protocols mitigated the variability in sequencing location and protocols that often leads to batch effects in large genomic datasets 9 . As a result, the data are not only of clinical-grade quality, but also consistent in coverage (≥30× mean) and uniformity across Genome Centers (Supplementary Figs. 1 – 5 ).

Joint calling and variant discovery

We carried out joint calling across the entire All of Us WGS dataset (Extended Data Fig. 2 ). Joint calling leverages information across samples to prune artefact variants, which increases sensitivity, and enables flagging samples with potential issues that were missed during single-sample QC 10 (Supplementary Table 3 ). Scaling conventional approaches to whole-genome joint calling beyond 50,000 individuals is a notable computational challenge 11 , 12 . To address this, we developed a new cloud variant storage solution, the Genomic Variant Store (GVS), which is based on a schema designed for querying and rendering variants in which the variants are stored in GVS and rendered to an analysable variant file, as opposed to the variant file being the primary storage mechanism (Code availability). We carried out QC on the joint call set on the basis of the approach developed for gnomAD 3.1 (ref.  13 ). This included flagging samples with outlying values in eight metrics (Supplementary Table 4 , Supplementary Fig. 2 and Methods ).

To calculate the sensitivity and precision of the joint call dataset, we included four well-characterized samples. We sequenced the National Institute of Standards and Technology reference materials (DNA samples) from the Genome in a Bottle consortium 13 and carried out variant calling as described above. We used the corresponding published set of variant calls for each sample as the ground truth in our sensitivity and precision calculations 14 . The overall sensitivity for single-nucleotide variants was over 98.7% and precision was more than 99.9%. For short insertions or deletions, the sensitivity was over 97% and precision was more than 99.6% (Supplementary Table 5 and Methods ).

The joint call set included more than 1 billion genetic variants. We annotated the joint call dataset on the basis of functional annotation (for example, gene symbol and protein change) using Illumina Nirvana 15 . We defined coding variants as those inducing an amino acid change on a canonical ENSEMBL transcript and found 272,051,104 non-coding and 3,913,722 coding variants that have not been described previously in dbSNP 16 v153 (Extended Data Table 1 ). A total of 3,912,832 (99.98%) of the coding variants are rare (allelic frequency < 0.01) and the remaining 883 (0.02%) are common (allelic frequency > 0.01). Of the coding variants, 454 (0.01%) are common in one or more of the non-European computed ancestries in All of Us, rare among participants of European ancestry, and have an allelic number greater than 1,000 (Extended Data Table 2 and Extended Data Fig. 3 ). The distributions of pathogenic, or likely pathogenic, ClinVar variant counts per participant, stratified by computed ancestry, filtered to only those variants that are found in individuals with an allele count of <40 are shown in Extended Data Fig. 4 . The potential medical implications of these known and new variants with respect to variant pathogenicity by ancestry are highlighted in a companion paper 17 . In particular, we find that the European ancestry subset has the highest rate of pathogenic variation (2.1%), which was twice the rate of pathogenic variation in individuals of East Asian ancestry 17 .The lower frequency of variants in East Asian individuals may be partially explained by the fact the sample size in that group is small and there may be knowledge bias in the variant databases that is reducing the number of findings in some of the less-studied ancestry groups.

Genetic ancestry and relatedness

Genetic ancestry inference confirmed that 51.1% of the All of Us WGS dataset is derived from individuals of non-European ancestry. Briefly, the ancestry categories are based on the same labels used in gnomAD 18 . We trained a classifier on a 16-dimensional principal component analysis (PCA) space of a diverse reference based on 3,202 samples and 151,159 autosomal single-nucleotide polymorphisms. We projected the All of Us samples into the PCA space of the training data, based on the same single-nucleotide polymorphisms from the WGS data, and generated categorical ancestry predictions from the trained classifier ( Methods ). Continuous genetic ancestry fractions for All of Us samples were inferred using the same PCA data, and participants’ patterns of ancestry and admixture were compared to their self-identified race and ethnicity (Fig. 2 and Methods ). Continuous ancestry inference carried out using genome-wide genotypes yields highly concordant estimates.

figure 2

a , b , Uniform manifold approximation and projection (UMAP) representations of All of Us WGS PCA data with self-described race ( a ) and ethnicity ( b ) labels. c , Proportion of genetic ancestry per individual in six distinct and coherent ancestry groups defined by Human Genome Diversity Project and 1000 Genomes samples.

Kinship estimation confirmed that All of Us WGS data consist largely of unrelated individuals with about 85% (215,107) having no first- or second-degree relatives in the dataset (Supplementary Fig. 6 ). As many genomic analyses leverage unrelated individuals, we identified the smallest set of samples that are required to be removed from the remaining individuals that had first- or second-degree relatives and retained one individual from each kindred. This procedure yielded a maximal independent set of 231,442 individuals (about 94%) with genome sequence data in the current release ( Methods ).

Genetic determinants of LDL-C

As a measure of data quality and utility, we carried out a single-variant genome-wide association study (GWAS) for LDL-C, a trait with well-established genomic architecture ( Methods ). Of the 245,388 WGS participants, 91,749 had one or more LDL-C measurements. The All of Us LDL-C GWAS identified 20 well-established genome-wide significant loci, with minimal genomic inflation (Fig. 3 , Extended Data Table 3 and Supplementary Fig. 7 ). We compared the results to those of a recent multi-ethnic LDL-C GWAS in the National Heart, Lung, and Blood Institute (NHLBI) TOPMed study that included 66,329 ancestrally diverse (56% non-European ancestry) individuals 19 . We found a strong correlation between the effect estimates for NHLBI TOPMed genome-wide significant loci and those of All of Us ( R 2  = 0.98, P  < 1.61 × 10 −45 ; Fig. 3 , inset). Notably, the per-locus effect sizes observed in All of Us are decreased compared to those in TOPMed, which is in part due to differences in the underlying statistical model, differences in the ancestral composition of these datasets and differences in laboratory value ascertainment between EHR-derived data and epidemiology studies. A companion manuscript extended this work to identify common and rare genetic associations for three diseases (atrial fibrillation, coronary artery disease and type 2 diabetes) and two quantitative traits (height and LDL-C) in the All of Us dataset and identified very high concordance with previous efforts across all of these diseases and traits 20 .

figure 3

Manhattan plot demonstrating robust replication of 20 well-established LDL-C genetic loci among 91,749 individuals with 1 or more LDL-C measurements. The red horizontal line denotes the genome wide significance threshold of P = 5 × 10 –8 . Inset, effect estimate ( β ) comparison between NHLBI TOPMed LDL-C GWAS ( x  axis) and All of Us LDL-C GWAS ( y  axis) for the subset of 194 independent variants clumped (window 250 kb, r2 0.5) that reached genome-wide significance in NHLBI TOPMed.

Genotype-by-phenotype associations

As another measure of data quality and utility, we tested replication rates of previously reported phenotype–genotype associations in the five predicted genetic ancestry populations present in the Phenotype/Genotype Reference Map (PGRM): AFR, African ancestry; AMR, Latino/admixed American ancestry; EAS, East Asian ancestry; EUR, European ancestry; SAS, South Asian ancestry. The PGRM contains published associations in the GWAS catalogue in these ancestry populations that map to International Classification of Diseases-based phenotype codes 21 . This replication study specifically looked across 4,947 variants, calculating replication rates for powered associations in each ancestry population. The overall replication rates for associations powered at 80% were: 72.0% (18/25) in AFR, 100% (13/13) in AMR, 46.6% (7/15) in EAS, 74.9% (1,064/1,421) in EUR, and 100% (1/1) in SAS. With the exception of the EAS ancestry results, these powered replication rates are comparable to those of the published PGRM analysis where the replication rates of several single-site EHR-linked biobanks ranges from 76% to 85%. These results demonstrate the utility of the data and also highlight opportunities for further work understanding the specifics of the All of Us population and the potential contribution of gene–environment interactions to genotype–phenotype mapping and motivates the development of methods for multi-site EHR phenotype data extraction, harmonization and genetic association studies.

More broadly, the All of Us resource highlights the opportunities to identify genotype–phenotype associations that differ across diverse populations 22 . For example, the Duffy blood group locus ( ACKR1 ) is more prevalent in individuals of AFR ancestry and individuals of AMR ancestry than in individuals of EUR ancestry. Although the phenome-wide association study of this locus highlights the well-established association of the Duffy blood group with lower white blood cell counts both in individuals of AFR and AMR ancestry 23 , 24 , it also revealed genetic-ancestry-specific phenotype patterns, with minimal phenotypic associations in individuals of EAS ancestry and individuals of EUR ancestry (Fig. 4 and Extended Data Table 4 ). Conversely, rs9273363 in the HLA-DQB1 locus is associated with increased risk of type 1 diabetes 25 , 26 and diabetic complications across ancestries, but only associates with increased risk of coeliac disease in individuals of EUR ancestry (Extended Data Fig. 5 ). Similarly, the TCF7L2 locus 27 strongly associates with increased risk of type 2 diabetes and associated complications across several ancestries (Extended Data Fig. 6 ). Association testing results are available in Supplementary Dataset 1 .

figure 4

Results of genetic-ancestry-stratified phenome-wide association analysis among unrelated individuals highlighting ancestry-specific disease associations across the four most common genetic ancestries of participant. Bonferroni-adjusted phenome-wide significance threshold (<2.88 × 10 −5 ) is plotted as a red horizontal line. AFR ( n  = 34,037, minor allele fraction (MAF) 0.82); AMR ( n  = 28,901, MAF 0.10); EAS ( n  = 32,55, MAF 0.003); EUR ( n  = 101,613, MAF 0.007).

The cloud-based Researcher Workbench

All of Us genomic data are available in a secure, access-controlled cloud-based analysis environment: the All of Us Researcher Workbench. Unlike traditional data access models that require per-project approval, access in the Researcher Workbench is governed by a data passport model based on a researcher’s authenticated identity, institutional affiliation, and completion of self-service training and compliance attestation 28 . After gaining access, a researcher may create a new workspace at any time to conduct a study, provided that they comply with all Data Use Policies and self-declare their research purpose. This information is regularly audited and made accessible publicly on the All of Us Research Projects Directory. This streamlined access model is guided by the principles that: participants are research partners and maintaining their privacy and data security is paramount; their data should be made as accessible as possible for authorized researchers; and we should continually seek to remove unnecessary barriers to accessing and using All of Us data.

For researchers at institutions with an existing institutional data use agreement, access can be gained as soon as they complete the required verification and compliance steps. As of August 2023, 556 institutions have agreements in place, allowing more than 5,000 approved researchers to actively work on more than 4,400 projects. The median time for a researcher from initial registration to completion of these requirements is 28.6 h (10th percentile: 48 min, 90th percentile: 14.9 days), a fraction of the weeks to months it can take to assemble a project-specific application and have it reviewed by an access board with conventional access models.

Given that the size of the project’s phenotypic and genomic dataset is expected to reach 4.75 PB in 2023, the use of a central data store and cloud analysis tools will save funders an estimated US$16.5 million per year when compared to the typical approach of allowing researchers to download genomic data. Storing one copy per institution of this data at 556 registered institutions would cost about US$1.16 billion per year. By contrast, storing a central cloud copy costs about US$1.14 million per year, a 99.9% saving. Importantly, cloud infrastructure also democratizes data access particularly for researchers who do not have high-performance local compute resources.

Here we present the All of Us Research Program’s approach to generating diverse clinical-grade genomic data at an unprecedented scale. We present the data release of about 245,000 genome sequences as part of a scalable framework that will grow to include genetic information and health data for one million or more people living across the USA. Our observations permit several conclusions.

First, the All of Us programme is making a notable contribution to improving the study of human biology through purposeful inclusion of under-represented individuals at scale 29 , 30 . Of the participants with genomic data in All of Us, 45.92% self-identified as a non-European race or ethnicity. This diversity enabled identification of more than 275 million new genetic variants across the dataset not previously captured by other large-scale genome aggregation efforts with diverse participants that have submitted variation to dbSNP v153, such as NHLBI TOPMed 31 freeze 8 (Extended Data Table 1 ). In contrast to gnomAD, All of Us permits individual-level genotype access with detailed phenotype data for all participants. Furthermore, unlike many genomics resources, All of Us is uniformly consented for general research use and enables researchers to go from initial account creation to individual-level data access in as little as a few hours. The All of Us cohort is significantly more diverse than those of other large contemporary research studies generating WGS data 32 , 33 . This enables a more equitable future for precision medicine (for example, through constructing polygenic risk scores that are appropriately calibrated to diverse populations 34 , 35 as the eMERGE programme has done leveraging All of Us data 36 , 37 ). Developing new tools and regulatory frameworks to enable analyses across multiple biobanks in the cloud to harness the unique strengths of each is an active area of investigation addressed in a companion paper to this work 38 .

Second, the All of Us Researcher Workbench embodies the programme’s design philosophy of open science, reproducible research, equitable access and transparency to researchers and to research participants 26 . Importantly, for research studies, no group of data users should have privileged access to All of Us resources based on anything other than data protection criteria. Although the All of Us Researcher Workbench initially targeted onboarding US academic, health care and non-profit organizations, it has recently expanded to international researchers. We anticipate further genomic and phenotypic data releases at regular intervals with data available to all researcher communities. We also anticipate additional derived data and functionality to be made available, such as reference data, structural variants and a service for array imputation using the All of Us genomic data.

Third, All of Us enables studying human biology at an unprecedented scale. The programmatic goal of sequencing one million or more genomes has required harnessing the output of multiple sequencing centres. Previous work has focused on achieving functional equivalence in data processing and joint calling pipelines 39 . To achieve clinical-grade data equivalence, All of Us required protocol equivalence at both sequencing production level and data processing across the sequencing centres. Furthermore, previous work has demonstrated the value of joint calling at scale 10 , 18 . The new GVS framework developed by the All of Us programme enables joint calling at extreme scales (Code availability). Finally, the provision of data access through cloud-native tools enables scalable and secure access and analysis to researchers while simultaneously enabling the trust of research participants and transparency underlying the All of Us data passport access model.

The clinical-grade sequencing carried out by All of Us enables not only research, but also the return of value to participants through clinically relevant genetic results and health-related traits to those who opt-in to receiving this information. In the years ahead, we anticipate that this partnership with All of Us participants will enable researchers to move beyond large-scale genomic discovery to understanding the consequences of implementing genomic medicine at scale.

The All of Us cohort

All of Us aims to engage a longitudinal cohort of one million or more US participants, with a focus on including populations that have historically been under-represented in biomedical research. Details of the All of Us cohort have been described previously 5 . Briefly, the primary objective is to build a robust research resource that can facilitate the exploration of biological, clinical, social and environmental determinants of health and disease. The programme will collect and curate health-related data and biospecimens, and these data and biospecimens will be made broadly available for research uses. Health data are obtained through the electronic medical record and through participant surveys. Survey templates can be found on our public website: https://www.researchallofus.org/data-tools/survey-explorer/ . Adults 18 years and older who have the capacity to consent and reside in the USA or a US territory at present are eligible. Informed consent for all participants is conducted in person or through an eConsent platform that includes primary consent, HIPAA Authorization for Research use of EHRs and other external health data, and Consent for Return of Genomic Results. The protocol was reviewed by the Institutional Review Board (IRB) of the All of Us Research Program. The All of Us IRB follows the regulations and guidance of the NIH Office for Human Research Protections for all studies, ensuring that the rights and welfare of research participants are overseen and protected uniformly.

Data accessibility through a ‘data passport’

Authorization for access to participant-level data in All of Us is based on a ‘data passport’ model, through which authorized researchers do not need IRB review for each research project. The data passport is required for gaining data access to the Researcher Workbench and for creating workspaces to carry out research projects using All of Us data. At present, data passports are authorized through a six-step process that includes affiliation with an institution that has signed a Data Use and Registration Agreement, account creation, identity verification, completion of ethics training, and attestation to a data user code of conduct. Results reported follow the All of Us Data and Statistics Dissemination Policy disallowing disclosure of group counts under 20 to protect participant privacy without seeking prior approval 40 .

At present, All of Us gathers EHR data from about 50 health care organizations that are funded to recruit and enrol participants as well as transfer EHR data for those participants who have consented to provide them. Data stewards at each provider organization harmonize their local data to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, and then submit it to the All of Us Data and Research Center (DRC) so that it can be linked with other participant data and further curated for research use. OMOP is a common data model standardizing health information from disparate EHRs to common vocabularies and organized into tables according to data domains. EHR data are updated from the recruitment sites and sent to the DRC quarterly. Updated data releases to the research community occur approximately once a year. Supplementary Table 6 outlines the OMOP concepts collected by the DRC quarterly from the recruitment sites.

Biospecimen collection and processing

Participants who consented to participate in All of Us donated fresh whole blood (4 ml EDTA and 10 ml EDTA) as a primary source of DNA. The All of Us Biobank managed by the Mayo Clinic extracted DNA from 4 ml EDTA whole blood, and DNA was stored at −80 °C at an average concentration of 150 ng µl −1 . The buffy coat isolated from 10 ml EDTA whole blood has been used for extracting DNA in the case of initial extraction failure or absence of 4 ml EDTA whole blood. The Biobank plated 2.4 µg DNA with a concentration of 60 ng µl −1 in duplicate for array and WGS samples. The samples are distributed to All of Us Genome Centers weekly, and a negative (empty well) control and National Institute of Standards and Technology controls are incorporated every two months for QC purposes.

Genome Center sample receipt, accession and QC

On receipt of DNA sample shipments, the All of Us Genome Centers carry out an inspection of the packaging and sample containers to ensure that sample integrity has not been compromised during transport and to verify that the sample containers correspond to the shipping manifest. QC of the submitted samples also includes DNA quantification, using routine procedures to confirm volume and concentration (Supplementary Table 7 ). Any issues or discrepancies are recorded, and affected samples are put on hold until resolved. Samples that meet quality thresholds are accessioned in the Laboratory Information Management System, and sample aliquots are prepared for library construction processing (for example, normalized with respect to concentration and volume).

WGS library construction, sequencing and primary data QC

The DNA sample is first sheared using a Covaris sonicator and is then size-selected using AMPure XP beads to restrict the range of library insert sizes. Using the PCR Free Kapa HyperPrep library construction kit, enzymatic steps are completed to repair the jagged ends of DNA fragments, add proper A-base segments, and ligate indexed adapter barcode sequences onto samples. Excess adaptors are removed using AMPure XP beads for a final clean-up. Libraries are quantified using quantitative PCR with the Illumina Kapa DNA Quantification Kit and then normalized and pooled for sequencing (Supplementary Table 7 ).

Pooled libraries are loaded on the Illumina NovaSeq 6000 instrument. The data from the initial sequencing run are used to QC individual libraries and to remove non-conforming samples from the pipeline. The data are also used to calibrate the pooling volume of each individual library and re-pool the libraries for additional NovaSeq sequencing to reach an average coverage of 30×.

After demultiplexing, WGS analysis occurs on the Illumina DRAGEN platform. The DRAGEN pipeline consists of highly optimized algorithms for mapping, aligning, sorting, duplicate marking and haplotype variant calling and makes use of platform features such as compression and BCL conversion. Alignment uses the GRCh38dh reference genome. QC data are collected at every stage of the analysis protocol, providing high-resolution metrics required to ensure data consistency for large-scale multiplexing. The DRAGEN pipeline produces a large number of metrics that cover lane, library, flow cell, barcode and sample-level metrics for all runs as well as assessing contamination and mapping quality. The All of Us Genome Centers use these metrics to determine pass or fail for each sample before submitting the CRAM files to the All of Us DRC. For mapping and variant calling, all Genome Centers have harmonized on a set of DRAGEN parameters, which ensures consistency in processing (Supplementary Table 2 ).

Every step through the WGS procedure is rigorously controlled by predefined QC measures. Various control mechanisms and acceptance criteria were established during WGS assay validation. Specific metrics for reviewing and releasing genome data are: mean coverage (threshold of ≥30×), genome coverage (threshold of ≥90% at 20×), coverage of hereditary disease risk genes (threshold of ≥95% at 20×), aligned Q30 bases (threshold of ≥8 × 10 10 ), contamination (threshold of ≤1%) and concordance to independently processed array data.

Array genotyping

Samples are processed for genotyping at three All of Us Genome Centers (Broad, Johns Hopkins University and University of Washington). DNA samples are received from the Biobank and the process is facilitated by the All of Us genomics workflow described above. All three centres used an identical array product, scanners, resource files and genotype calling software for array processing to reduce batch effects. Each centre has its own Laboratory Information Management System that manages workflow control, sample and reagent tracking, and centre-specific liquid handling robotics.

Samples are processed using the Illumina Global Diversity Array (GDA) with Illumina Infinium LCG chemistry using the automated protocol and scanned on Illumina iSCANs with Automated Array Loaders. Illumina IAAP software converts raw data (IDAT files; 2 per sample) into a single GTC file per sample using the BPM file (defines strand, probe sequences and illumicode address) and the EGT file (defines the relationship between intensities and genotype calls). Files used for this data release are: GDA-8v1-0_A5.bpm, GDA-8v1-0_A1_ClusterFile.egt, gentrain v3, reference hg19 and gencall cutoff 0.15. The GDA array assays a total of 1,914,935 variant positions including 1,790,654 single-nucleotide variants, 44,172 indels, 9,935 intensity-only probes for CNV calling, and 70,174 duplicates (same position, different probes). Picard GtcToVcf is used to convert the GTC files to VCF format. Resulting VCF and IDAT files are submitted to the DRC for ingestion and further processing. The VCF file contains assay name, chromosome, position, genotype calls, quality score, raw and normalized intensities, B allele frequency and log R ratio values. Each genome centre is running the GDA array under Clinical Laboratory Improvement Amendments-compliant protocols. The GTC files are parsed and metrics are uploaded to in-house Laboratory Information Management System systems for QC review.

At batch level (each set of 96-well plates run together in the laboratory at one time), each genome centre includes positive control samples that are required to have >98% call rate and >99% concordance to existing data to approve release of the batch of data. At the sample level, the call rate and sex are the key QC determinants 41 . Contamination is also measured using BAFRegress 42 and reported out as metadata. Any sample with a call rate below 98% is repeated one time in the laboratory. Genotyped sex is determined by plotting normalized x versus normalized y intensity values for a batch of samples. Any sample discordant with ‘sex at birth’ reported by the All of Us participant is flagged for further detailed review and repeated one time in the laboratory. If several sex-discordant samples are clustered on an array or on a 96-well plate, the entire array or plate will have data production repeated. Samples identified with sex chromosome aneuploidies are also reported back as metadata (XXX, XXY, XYY and so on). A final processing status of ‘pass’, ‘fail’ or ‘abandon’ is determined before release of data to the All of Us DRC. An array sample will pass if the call rate is >98% and the genotyped sex and sex at birth are concordant (or the sex at birth is not applicable). An array sample will fail if the genotyped sex and the sex at birth are discordant. An array sample will have the status of abandon if the call rate is <98% after at least two attempts at the genome centre.

Data from the arrays are used for participant return of genetic ancestry and non-health-related traits for those who consent, and they are also used to facilitate additional QC of the matched WGS data. Contamination is assessed in the array data to determine whether DNA re-extraction is required before WGS. Re-extraction is prompted by level of contamination combined with consent status for return of results. The arrays are also used to confirm sample identity between the WGS data and the matched array data by assessing concordance at 100 unique sites. To establish concordance, a fingerprint file of these 100 sites is provided to the Genome Centers to assess concordance with the same sites in the WGS data before CRAM submission.

Genomic data curation

As seen in Extended Data Fig. 2 , we generate a joint call set for all WGS samples and make these data available in their entirety and by sample subsets to researchers. A breakdown of the frequencies, stratified by computed ancestries for which we had more than 10,000 participants can be found in Extended Data Fig. 3 . The joint call set process allows us to leverage information across samples to improve QC and increase accuracy.

Single-sample QC

If a sample fails single-sample QC, it is excluded from the release and is not reported in this document. These tests detect sample swaps, cross-individual contamination and sample preparation errors. In some cases, we carry out these tests twice (at both the Genome Center and the DRC), for two reasons: to confirm internal consistency between sites; and to mark samples as passing (or failing) QC on the basis of the research pipeline criteria. The single-sample QC process accepts a higher contamination rate than the clinical pipeline (0.03 for the research pipeline versus 0.01 for the clinical pipeline), but otherwise uses identical thresholds. The list of specific QC processes, passing criteria, error modes addressed and an overview of the results can be found in Supplementary Table 3 .

Joint call set QC

During joint calling, we carry out additional QC steps using information that is available across samples including hard thresholds, population outliers, allele-specific filters, and sensitivity and precision evaluation. Supplementary Table 4 summarizes both the steps that we took and the results obtained for the WGS data. More detailed information about the methods and specific parameters can be found in the All of Us Genomic Research Data Quality Report 36 .

Batch effect analysis

We analysed cross-sequencing centre batch effects in the joint call set. To quantify the batch effect, we calculated Cohen’s d (ref.  43 ) for four metrics (insertion/deletion ratio, single-nucleotide polymorphism count, indel count and single-nucleotide polymorphism transition/transversion ratio) across the three genome sequencing centres (Baylor College of Medicine, Broad Institute and University of Washington), stratified by computed ancestry and seven regions of the genome (whole genome, high-confidence calling, repetitive, GC content of >0.85, GC content of <0.15, low mappability, the ACMG59 genes and regions of large duplications (>1 kb)). Using random batches as a control set, all comparisons had a Cohen’s d of <0.35. Here we report any Cohen’s d results >0.5, which we chose before this analysis and is conventionally the threshold of a medium effect size 44 .

We found that there was an effect size in indel counts (Cohen’s d of 0.53) in the entire genome, between Broad Institute and University of Washington, but this was being driven by repetitive and low-mappability regions. We found no batch effects with Cohen’s d of >0.5 in the ratio metrics or in any metrics in the high-confidence calling, low or high GC content, or ACMG59 regions. A complete list of the batch effects with Cohen’s d of >0.5 are found in Supplementary Table 8 .

Sensitivity and precision evaluation

To determine sensitivity and precision, we included four well-characterized control samples (four National Institute of Standards and Technology Genome in a Bottle samples (HG-001, HG-003, HG-004 and HG-005). The samples were sequenced with the same protocol as All of Us. Of note, these samples were not included in data released to researchers. We used the corresponding published set of variant calls for each sample as the ground truth in our sensitivity and precision calculations. We use the high-confidence calling region, defined by Genome in a Bottle v4.2.1, as the source of ground truth. To be called a true positive, a variant must match the chromosome, position, reference allele, alternate allele and zygosity. In cases of sites with multiple alternative alleles, each alternative allele is considered separately. Sensitivity and precision results are reported in Supplementary Table 5 .

Genetic ancestry inference

We computed categorical ancestry for all WGS samples in All of Us and made these available to researchers. These predictions are also the basis for population allele frequency calculations in the Genomic Variants section of the public Data Browser. We used the high-quality set of sites to determine an ancestry label for each sample. The ancestry categories are based on the same labels used in gnomAD 18 , the Human Genome Diversity Project (HGDP) 45 and 1000 Genomes 1 : African (AFR); Latino/admixed American (AMR); East Asian (EAS); Middle Eastern (MID); European (EUR), composed of Finnish (FIN) and Non-Finnish European (NFE); Other (OTH), not belonging to one of the other ancestries or is an admixture; South Asian (SAS).

We trained a random forest classifier 46 on a training set of the HGDP and 1000 Genomes samples variants on the autosome, obtained from gnomAD 11 . We generated the first 16 principal components (PCs) of the training sample genotypes (using the hwe_normalized_pca in Hail) at the high-quality variant sites for use as the feature vector for each training sample. We used the truth labels from the sample metadata, which can be found alongside the VCFs. Note that we do not train the classifier on the samples labelled as Other. We use the label probabilities (‘confidence’) of the classifier on the other ancestries to determine ancestry of Other.

To determine the ancestry of All of Us samples, we project the All of Us samples into the PCA space of the training data and apply the classifier. As a proxy for the accuracy of our All of Us predictions, we look at the concordance between the survey results and the predicted ancestry. The concordance between self-reported ethnicity and the ancestry predictions was 87.7%.

PC data from All of Us samples and the HGDP and 1000 Genomes samples were used to compute individual participant genetic ancestry fractions for All of Us samples using the Rye program. Rye uses PC data to carry out rapid and accurate genetic ancestry inference on biobank-scale datasets 47 . HGDP and 1000 Genomes reference samples were used to define a set of six distinct and coherent ancestry groups—African, East Asian, European, Middle Eastern, Latino/admixed American and South Asian—corresponding to participant self-identified race and ethnicity groups. Rye was run on the first 16 PCs, using the defined reference ancestry groups to assign ancestry group fractions to individual All of Us participant samples.

Relatedness

We calculated the kinship score using the Hail pc_relate function and reported any pairs with a kinship score above 0.1. The kinship score is half of the fraction of the genetic material shared (ranges from 0.0 to 0.5). We determined the maximal independent set 41 for related samples. We identified a maximally unrelated set of 231,442 samples (94%) for kinship scored greater than 0.1.

LDL-C common variant GWAS

The phenotypic data were extracted from the Curated Data Repository (CDR, Control Tier Dataset v7) in the All of Us Researcher Workbench. The All of Us Cohort Builder and Dataset Builder were used to extract all LDL cholesterol measurements from the Lab and Measurements criteria in EHR data for all participants who have WGS data. The most recent measurements were selected as the phenotype and adjusted for statin use 19 , age and sex. A rank-based inverse normal transformation was applied for this continuous trait to increase power and deflate type I error. Analysis was carried out on the Hail MatrixTable representation of the All of Us WGS joint-called data including removing monomorphic variants, variants with a call rate of <95% and variants with extreme Hardy–Weinberg equilibrium values ( P  < 10 −15 ). A linear regression was carried out with REGENIE 48 on variants with a minor allele frequency >5%, further adjusting for relatedness to the first five ancestry PCs. The final analysis included 34,924 participants and 8,589,520 variants.

Genotype-by-phenotype replication

We tested replication rates of known phenotype–genotype associations in three of the four largest populations: EUR, AFR and EAS. The AMR population was not included because they have no registered GWAS. This method is a conceptual extension of the original GWAS × phenome-wide association study, which replicated 66% of powered associations in a single EHR-linked biobank 49 . The PGRM is an expansion of this work by Bastarache et al., based on associations in the GWAS catalogue 50 in June 2020 (ref.  51 ). After directly matching the Experimental Factor Ontology terms to phecodes, the authors identified 8,085 unique loci and 170 unique phecodes that compose the PGRM. They showed replication rates in several EHR-linked biobanks ranging from 76% to 85%. For this analysis, we used the EUR-, and AFR-based maps, considering only catalogue associations that were P  < 5 × 10 −8 significant.

The main tools used were the Python package Hail for data extraction, plink for genomic associations, and the R packages PheWAS and pgrm for further analysis and visualization. The phenotypes, participant-reported sex at birth, and year of birth were extracted from the All of Us CDR (Controlled Tier Dataset v7). These phenotypes were then loaded into a plink-compatible format using the PheWAS package, and related samples were removed by sub-setting to the maximally unrelated dataset ( n  = 231,442). Only samples with EHR data were kept, filtered by selected loci, annotated with demographic and phenotypic information extracted from the CDR and ancestry prediction information provided by All of Us, ultimately resulting in 181,345 participants for downstream analysis. The variants in the PGRM were filtered by a minimum population-specific allele frequency of >1% or population-specific allele count of >100, leaving 4,986 variants. Results for which there were at least 20 cases in the ancestry group were included. Then, a series of Firth logistic regression tests with phecodes as the outcome and variants as the predictor were carried out, adjusting for age, sex (for non-sex-specific phenotypes) and the first three genomic PC features as covariates. The PGRM was annotated with power calculations based on the case counts and reported allele frequencies. Power of 80% or greater was considered powered for this analysis.

Reporting summary

Further information on research design is available in the  Nature Portfolio Reporting Summary linked to this article.

Data availability

The All of Us Research Hub has a tiered data access data passport model with three data access tiers. The Public Tier dataset contains only aggregate data with identifiers removed. These data are available to the public through Data Snapshots ( https://www.researchallofus.org/data-tools/data-snapshots/ ) and the public Data Browser ( https://databrowser.researchallofus.org/ ). The Registered Tier curated dataset contains individual-level data, available only to approved researchers on the Researcher Workbench. At present, the Registered Tier includes data from EHRs, wearables and surveys, as well as physical measurements taken at the time of participant enrolment. The Controlled Tier dataset contains all data in the Registered Tier and additionally genomic data in the form of WGS and genotyping arrays, previously suppressed demographic data fields from EHRs and surveys, and unshifted dates of events. At present, Registered Tier and Controlled Tier data are available to researchers at academic institutions, non-profit institutions, and both non-profit and for-profit health care institutions. Work is underway to begin extending access to additional audiences, including industry-affiliated researchers. Researchers have the option to register for Registered Tier and/or Controlled Tier access by completing the All of Us Researcher Workbench access process, which includes identity verification and All of Us-specific training in research involving human participants ( https://www.researchallofus.org/register/ ). Researchers may create a new workspace at any time to conduct any research study, provided that they comply with all Data Use Policies and self-declare their research purpose. This information is made accessible publicly on the All of Us Research Projects Directory at https://allofus.nih.gov/protecting-data-and-privacy/research-projects-all-us-data .

Code availability

The GVS code is available at https://github.com/broadinstitute/gatk/tree/ah_var_store/scripts/variantstore . The LDL GWAS pipeline is available as a demonstration project in the Featured Workspace Library on the Researcher Workbench ( https://workbench.researchallofus.org/workspaces/aou-rw-5981f9dc/aouldlgwasregeniedsubctv6duplicate/notebooks ).

The 1000 Genomes Project Consortium et al. A global reference for human genetic variation. Nature 526 , 68–74 (2015).

Article   Google Scholar  

Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577 , 179–189 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570 , 514–518 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Lewis, A. C. F. et al. Getting genetic ancestry right for science and society. Science 376 , 250–252 (2022).

All of Us Program Investigators. The “All of Us” Research Program. N. Engl. J. Med. 381 , 668–676 (2019).

Ramirez, A. H., Gebo, K. A. & Harris, P. A. Progress with the All of Us Research Program: opening access for researchers. JAMA 325 , 2441–2442 (2021).

Article   PubMed   Google Scholar  

Ramirez, A. H. et al. The All of Us Research Program: data quality, utility, and diversity. Patterns 3 , 100570 (2022).

Article   PubMed   PubMed Central   Google Scholar  

Overhage, J. M., Ryan, P. B., Reich, C. G., Hartzema, A. G. & Stang, P. E. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 19 , 54–60 (2012).

Venner, E. et al. Whole-genome sequencing as an investigational device for return of hereditary disease risk and pharmacogenomic results as part of the All of Us Research Program. Genome Med. 14 , 34 (2022).

Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536 , 285–291 (2016).

Tiao, G. & Goodrich, J. gnomAD v3.1 New Content, Methods, Annotations, and Data Availability ; https://gnomad.broadinstitute.org/news/2020-10-gnomad-v3-1-new-content-methods-annotations-and-data-availability/ .

Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625 , 92–100 (2022).

Zook, J. M. et al. An open resource for accurately benchmarking small variant and reference calls. Nat. Biotechnol. 37 , 561–566 (2019).

Krusche, P. et al. Best practices for benchmarking germline small-variant calls in human genomes. Nat. Biotechnol. 37 , 555–560 (2019).

Stromberg, M. et al. Nirvana: clinical grade variant annotator. In Proc. 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics 596 (Association for Computing Machinery, 2017).

Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29 , 308–311 (2001).

Venner, E. et al. The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities. Commun. Biol. https://doi.org/10.1038/s42003-023-05708-y (2024).

Karczewski, S. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581 , 434–443 (2020).

Selvaraj, M. S. et al. Whole genome sequence analysis of blood lipid levels in >66,000 individuals. Nat. Commun. 13 , 5995 (2022).

Wang, X. et al. Common and rare variants associated with cardiometabolic traits across 98,622 whole-genome sequences in the All of Us research program. J. Hum. Genet. 68 , 565–570 (2023).

Bastarache, L. et al. The phenotype-genotype reference map: improving biobank data science through replication. Am. J. Hum. Genet. 110 , 1522–1533 (2023).

Bianchi, D. W. et al. The All of Us Research Program is an opportunity to enhance the diversity of US biomedical research. Nat. Med. https://doi.org/10.1038/s41591-023-02744-3 (2024).

Van Driest, S. L. et al. Association between a common, benign genotype and unnecessary bone marrow biopsies among African American patients. JAMA Intern. Med. 181 , 1100–1105 (2021).

Chen, M.-H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182 , 1198–1213 (2020).

Chiou, J. et al. Interpreting type 1 diabetes risk with genetics and single-cell epigenomics. Nature 594 , 398–402 (2021).

Hu, X. et al. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk. Nat. Genet. 47 , 898–905 (2015).

Grant, S. F. A. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38 , 320–323 (2006).

Article   CAS   PubMed   Google Scholar  

All of Us Research Program. Framework for Access to All of Us Data Resources v1.1 (2021); https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/data&tools/data-access-use/AoU_Data_Access_Framework_508.pdf .

Abul-Husn, N. S. & Kenny, E. E. Personalized medicine and the power of electronic health records. Cell 177 , 58–69 (2019).

Mapes, B. M. et al. Diversity and inclusion for the All of Us research program: A scoping review. PLoS ONE 15 , e0234962 (2020).

Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590 , 290–299 (2021).

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562 , 203–209 (2018).

Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607 , 732–740 (2022).

Kurniansyah, N. et al. Evaluating the use of blood pressure polygenic risk scores across race/ethnic background groups. Nat. Commun. 14 , 3202 (2023).

Hou, K. et al. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat. Genet. 55 , 549– 558 (2022).

Linder, J. E. et al. Returning integrated genomic risk and clinical recommendations: the eMERGE study. Genet. Med. 25 , 100006 (2023).

Lennon, N. J. et al. Selection, optimization and validation of ten chronic disease polygenic risk scores for clinical implementation in diverse US populations. Nat. Med. https://doi.org/10.1038/s41591-024-02796-z (2024).

Deflaux, N. et al. Demonstrating paths for unlocking the value of cloud genomics through cross cohort analysis. Nat. Commun. 14 , 5419 (2023).

Regier, A. A. et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun. 9 , 4038 (2018).

Article   ADS   PubMed   PubMed Central   Google Scholar  

All of Us Research Program. Data and Statistics Dissemination Policy (2020); https://www.researchallofus.org/wp-content/themes/research-hub-wordpress-theme/media/2020/05/AoU_Policy_Data_and_Statistics_Dissemination_508.pdf .

Laurie, C. C. et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34 , 591–602 (2010).

Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91 , 839–848 (2012).

Cohen, J. Statistical Power Analysis for the Behavioral Sciences (Routledge, 2013).

Andrade, C. Mean difference, standardized mean difference (SMD), and their use in meta-analysis. J. Clin. Psychiatry 81 , 20f13681 (2020).

Cavalli-Sforza, L. L. The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6 , 333–340 (2005).

Ho, T. K. Random decision forests. In Proc. 3rd International Conference on Document Analysis and Recognition (IEEE Computer Society Press, 2002).

Conley, A. B. et al. Rye: genetic ancestry inference at biobank scale. Nucleic Acids Res. 51 , e44 (2023).

Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53 , 1097–1103 (2021).

Denny, J. C. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotech. 31 , 1102–1111 (2013).

Buniello, A. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47 , D1005–D1012 (2019).

Bastarache, L. et al. The Phenotype-Genotype Reference Map: improving biobank data science through replication. Am. J. Hum. Genet. 10 , 1522–1533 (2023).

Download references

Acknowledgements

The All of Us Research Program is supported by the National Institutes of Health, Office of the Director: Regional Medical Centers (OT2 OD026549; OT2 OD026554; OT2 OD026557; OT2 OD026556; OT2 OD026550; OT2 OD 026552; OT2 OD026553; OT2 OD026548; OT2 OD026551; OT2 OD026555); Inter agency agreement AOD 16037; Federally Qualified Health Centers HHSN 263201600085U; Data and Research Center: U2C OD023196; Genome Centers (OT2 OD002748; OT2 OD002750; OT2 OD002751); Biobank: U24 OD023121; The Participant Center: U24 OD023176; Participant Technology Systems Center: U24 OD023163; Communications and Engagement: OT2 OD023205; OT2 OD023206; and Community Partners (OT2 OD025277; OT2 OD025315; OT2 OD025337; OT2 OD025276). In addition, the All of Us Research Program would not be possible without the partnership of its participants. All of Us and the All of Us logo are service marks of the US Department of Health and Human Services. E.E.E. is an investigator of the Howard Hughes Medical Institute. We acknowledge the foundational contributions of our friend and colleague, the late Deborah A. Nickerson. Debbie’s years of insightful contributions throughout the formation of the All of Us genomics programme are permanently imprinted, and she shares credit for all of the successes of this programme.

Author information

Authors and affiliations.

Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA

Alexander G. Bick & Henry R. Condon

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA

Ginger A. Metcalf, Eric Boerwinkle, Richard A. Gibbs, Donna M. Muzny, Eric Venner, Kimberly Walker, Jianhong Hu, Harsha Doddapaneni, Christie L. Kovar, Mullai Murugan, Shannon Dugan, Ziad Khan & Eric Boerwinkle

Vanderbilt Institute of Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA

Kelsey R. Mayo, Jodell E. Linder, Melissa Basford, Ashley Able, Ashley E. Green, Robert J. Carroll, Jennifer Zhang & Yuanyuan Wang

Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA

Lee Lichtenstein, Anthony Philippakis, Sophie Schwartz, M. Morgan T. Aster, Kristian Cibulskis, Andrea Haessly, Rebecca Asch, Aurora Cremer, Kylee Degatano, Akum Shergill, Laura D. Gauthier, Samuel K. Lee, Aaron Hatcher, George B. Grant, Genevieve R. Brandt, Miguel Covarrubias, Eric Banks & Wail Baalawi

Verily, South San Francisco, CA, USA

Shimon Rura, David Glazer, Moira K. Dillon & C. H. Albach

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA

Robert J. Carroll, Paul A. Harris & Dan M. Roden

All of Us Research Program, National Institutes of Health, Bethesda, MD, USA

Anjene Musick, Andrea H. Ramirez, Sokny Lim, Siddhartha Nambiar, Bradley Ozenberger, Anastasia L. Wise, Chris Lunt, Geoffrey S. Ginsburg & Joshua C. Denny

School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA

I. King Jordan, Shashwat Deepali Nagar & Shivam Sharma

Neuroscience Institute, Institute of Translational Genomic Medicine, Morehouse School of Medicine, Atlanta, GA, USA

Robert Meller

Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA

Mine S. Cicek, Stephen N. Thibodeau & Mine S. Cicek

Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA

Kimberly F. Doheny, Michelle Z. Mawhinney, Sean M. L. Griffith, Elvin Hsu, Hua Ling & Marcia K. Adams

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA

Evan E. Eichler, Joshua D. Smith, Christian D. Frazar, Colleen P. Davis, Karynne E. Patterson, Marsha M. Wheeler, Sean McGee, Mitzi L. Murray, Valeria Vasta, Dru Leistritz, Matthew A. Richardson, Aparna Radhakrishnan & Brenna W. Ehmen

Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA

Evan E. Eichler

Broad Institute of MIT and Harvard, Cambridge, MA, USA

Stacey Gabriel, Heidi L. Rehm, Niall J. Lennon, Christina Austin-Tse, Eric Banks, Michael Gatzen, Namrata Gupta, Katie Larsson, Sheli McDonough, Steven M. Harrison, Christopher Kachulis, Matthew S. Lebo, Seung Hoan Choi & Xin Wang

Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA

Gail P. Jarvik & Elisabeth A. Rosenthal

Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA

Dan M. Roden

Department of Pharmacology, Vanderbilt University Medical Center, Nashville, TN, USA

Center for Individualized Medicine, Biorepository Program, Mayo Clinic, Rochester, MN, USA

Stephen N. Thibodeau, Ashley L. Blegen, Samantha J. Wirkus, Victoria A. Wagner, Jeffrey G. Meyer & Mine S. Cicek

Color Health, Burlingame, CA, USA

Scott Topper, Cynthia L. Neben, Marcie Steeves & Alicia Y. Zhou

School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA

Eric Boerwinkle

Laboratory for Molecular Medicine, Massachusetts General Brigham Personalized Medicine, Cambridge, MA, USA

Christina Austin-Tse, Emma Henricks & Matthew S. Lebo

Department of Laboratory Medicine and Pathology, University of Washington School of Medicine, Seattle, WA, USA

Christina M. Lockwood, Brian H. Shirts, Colin C. Pritchard, Jillian G. Buchan & Niklas Krumm

Manuscript Writing Group

  • Alexander G. Bick
  • , Ginger A. Metcalf
  • , Kelsey R. Mayo
  • , Lee Lichtenstein
  • , Shimon Rura
  • , Robert J. Carroll
  • , Anjene Musick
  • , Jodell E. Linder
  • , I. King Jordan
  • , Shashwat Deepali Nagar
  • , Shivam Sharma
  •  & Robert Meller

All of Us Research Program Genomics Principal Investigators

  • Melissa Basford
  • , Eric Boerwinkle
  • , Mine S. Cicek
  • , Kimberly F. Doheny
  • , Evan E. Eichler
  • , Stacey Gabriel
  • , Richard A. Gibbs
  • , David Glazer
  • , Paul A. Harris
  • , Gail P. Jarvik
  • , Anthony Philippakis
  • , Heidi L. Rehm
  • , Dan M. Roden
  • , Stephen N. Thibodeau
  •  & Scott Topper

Biobank, Mayo

  • Ashley L. Blegen
  • , Samantha J. Wirkus
  • , Victoria A. Wagner
  • , Jeffrey G. Meyer
  •  & Stephen N. Thibodeau

Genome Center: Baylor-Hopkins Clinical Genome Center

  • Donna M. Muzny
  • , Eric Venner
  • , Michelle Z. Mawhinney
  • , Sean M. L. Griffith
  • , Elvin Hsu
  • , Marcia K. Adams
  • , Kimberly Walker
  • , Jianhong Hu
  • , Harsha Doddapaneni
  • , Christie L. Kovar
  • , Mullai Murugan
  • , Shannon Dugan
  • , Ziad Khan
  •  & Richard A. Gibbs

Genome Center: Broad, Color, and Mass General Brigham Laboratory for Molecular Medicine

  • Niall J. Lennon
  • , Christina Austin-Tse
  • , Eric Banks
  • , Michael Gatzen
  • , Namrata Gupta
  • , Emma Henricks
  • , Katie Larsson
  • , Sheli McDonough
  • , Steven M. Harrison
  • , Christopher Kachulis
  • , Matthew S. Lebo
  • , Cynthia L. Neben
  • , Marcie Steeves
  • , Alicia Y. Zhou
  • , Scott Topper
  •  & Stacey Gabriel

Genome Center: University of Washington

  • Gail P. Jarvik
  • , Joshua D. Smith
  • , Christian D. Frazar
  • , Colleen P. Davis
  • , Karynne E. Patterson
  • , Marsha M. Wheeler
  • , Sean McGee
  • , Christina M. Lockwood
  • , Brian H. Shirts
  • , Colin C. Pritchard
  • , Mitzi L. Murray
  • , Valeria Vasta
  • , Dru Leistritz
  • , Matthew A. Richardson
  • , Jillian G. Buchan
  • , Aparna Radhakrishnan
  • , Niklas Krumm
  •  & Brenna W. Ehmen

Data and Research Center

  • Lee Lichtenstein
  • , Sophie Schwartz
  • , M. Morgan T. Aster
  • , Kristian Cibulskis
  • , Andrea Haessly
  • , Rebecca Asch
  • , Aurora Cremer
  • , Kylee Degatano
  • , Akum Shergill
  • , Laura D. Gauthier
  • , Samuel K. Lee
  • , Aaron Hatcher
  • , George B. Grant
  • , Genevieve R. Brandt
  • , Miguel Covarrubias
  • , Melissa Basford
  • , Alexander G. Bick
  • , Ashley Able
  • , Ashley E. Green
  • , Jennifer Zhang
  • , Henry R. Condon
  • , Yuanyuan Wang
  • , Moira K. Dillon
  • , C. H. Albach
  • , Wail Baalawi
  •  & Dan M. Roden

All of Us Research Demonstration Project Teams

  • Seung Hoan Choi
  • , Elisabeth A. Rosenthal

NIH All of Us Research Program Staff

  • Andrea H. Ramirez
  • , Sokny Lim
  • , Siddhartha Nambiar
  • , Bradley Ozenberger
  • , Anastasia L. Wise
  • , Chris Lunt
  • , Geoffrey S. Ginsburg
  •  & Joshua C. Denny

Contributions

The All of Us Biobank (Mayo Clinic) collected, stored and plated participant biospecimens. The All of Us Genome Centers (Baylor-Hopkins Clinical Genome Center; Broad, Color, and Mass General Brigham Laboratory for Molecular Medicine; and University of Washington School of Medicine) generated and QCed the whole-genomic data. The All of Us Data and Research Center (Vanderbilt University Medical Center, Broad Institute of MIT and Harvard, and Verily) generated the WGS joint call set, carried out quality assurance and QC analyses and developed the Researcher Workbench. All of Us Research Demonstration Project Teams contributed analyses. The other All of Us Genomics Investigators and NIH All of Us Research Program Staff provided crucial programmatic support. Members of the manuscript writing group (A.G.B., G.A.M., K.R.M., L.L., S.R., R.J.C. and A.M.) wrote the first draft of this manuscript, which was revised with contributions and feedback from all authors.

Corresponding author

Correspondence to Alexander G. Bick .

Ethics declarations

Competing interests.

D.M.M., G.A.M., E.V., K.W., J.H., H.D., C.L.K., M.M., S.D., Z.K., E. Boerwinkle and R.A.G. declare that Baylor Genetics is a Baylor College of Medicine affiliate that derives revenue from genetic testing. Eric Venner is affiliated with Codified Genomics, a provider of genetic interpretation. E.E.E. is a scientific advisory board member of Variant Bio, Inc. A.G.B. is a scientific advisory board member of TenSixteen Bio. The remaining authors declare no competing interests.

Peer review

Peer review information.

Nature thanks Timothy Frayling and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended data fig. 1 historic availability of ehr records in all of us v7 controlled tier curated data repository (n = 413,457)..

For better visibility, the plot shows growth starting in 2010.

Extended Data Fig. 2 Overview of the Genomic Data Curation Pipeline for WGS samples.

The Data and Research Center (DRC) performs additional single sample quality control (QC) on the data as it arrives from the Genome Centers. The variants from samples that pass this QC are loaded into the Genomic Variant Store (GVS), where we jointly call the variants and apply additional QC. We apply a joint call set QC process, which is stored with the call set. The entire joint call set is rendered as a Hail Variant Dataset (VDS), which can be accessed from the analysis notebooks in the Researcher Workbench. Subsections of the genome are extracted from the VDS and rendered in different formats with all participants. Auxiliary data can also be accessed through the Researcher Workbench. This includes variant functional annotations, joint call set QC results, predicted ancestry, and relatedness. Auxiliary data are derived from GVS (arrow not shown) and the VDS. The Cohort Builder directly queries GVS when researchers request genomic data for subsets of samples. Aligned reads, as cram files, are available in the Researcher Workbench (not shown). The graphics of the dish, gene and computer and the All of Us logo are reproduced with permission of the National Institutes of Health’s All of Us Research Program.

Extended Data Fig. 3 Proportion of allelic frequencies (AF), stratified by computed ancestry with over 10,000 participants.

Bar counts are not cumulative (eg, “pop AF < 0.01” does not include “pop AF < 0.001”).

Extended Data Fig. 4 Distribution of pathogenic, and likely pathogenic ClinVar variants.

Stratified by ancestry filtered to only those variants that are found in allele count (AC) < 40 individuals for 245,388 short read WGS samples.

Extended Data Fig. 5 Ancestry specific HLA-DQB1 ( rs9273363 ) locus associations in 231,442 unrelated individuals.

Phenome-wide (PheWAS) associations highlight ancestry specific consequences across ancestries.

Extended Data Fig. 6 Ancestry specific TCF7L2 ( rs7903146 ) locus associations in 231,442 unrelated individuals.

Phenome-wide (PheWAS) associations highlight diabetic consequences across ancestries.

Supplementary information

Supplementary information.

Supplementary Figs. 1–7, Tables 1–8 and Note.

Reporting Summary

Supplementary dataset 1.

Associations of ACKR1, HLA-DQB1 and TCF7L2 loci with all Phecodes stratified by genetic ancestry.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

The All of Us Research Program Genomics Investigators. Genomic data in the All of Us Research Program. Nature (2024). https://doi.org/10.1038/s41586-023-06957-x

Download citation

Received : 22 July 2022

Accepted : 08 December 2023

Published : 19 February 2024

DOI : https://doi.org/10.1038/s41586-023-06957-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

best research paper database

Our next-generation model: Gemini 1.5

Feb 15, 2024

The model delivers dramatically enhanced performance, with a breakthrough in long-context understanding across modalities.

SundarPichai_2x.jpg

A note from Google and Alphabet CEO Sundar Pichai:

Last week, we rolled out our most capable model, Gemini 1.0 Ultra, and took a significant step forward in making Google products more helpful, starting with Gemini Advanced . Today, developers and Cloud customers can begin building with 1.0 Ultra too — with our Gemini API in AI Studio and in Vertex AI .

Our teams continue pushing the frontiers of our latest models with safety at the core. They are making rapid progress. In fact, we’re ready to introduce the next generation: Gemini 1.5. It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute.

This new generation also delivers a breakthrough in long-context understanding. We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.

Longer context windows show us the promise of what is possible. They will enable entirely new capabilities and help developers build much more useful models and applications. We’re excited to offer a limited preview of this experimental feature to developers and enterprise customers. Demis shares more on capabilities, safety and availability below.

Introducing Gemini 1.5

By Demis Hassabis, CEO of Google DeepMind, on behalf of the Gemini team

This is an exciting time for AI. New advances in the field have the potential to make AI more helpful for billions of people over the coming years. Since introducing Gemini 1.0 , we’ve been testing, refining and enhancing its capabilities.

Today, we’re announcing our next-generation model: Gemini 1.5.

Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in our approach, building upon research and engineering innovations across nearly every part of our foundation model development and infrastructure. This includes making Gemini 1.5 more efficient to train and serve, with a new Mixture-of-Experts (MoE) architecture.

The first Gemini 1.5 model we’re releasing for early testing is Gemini 1.5 Pro. It’s a mid-size multimodal model, optimized for scaling across a wide-range of tasks, and performs at a similar level to 1.0 Ultra , our largest model to date. It also introduces a breakthrough experimental feature in long-context understanding.

Gemini 1.5 Pro comes with a standard 128,000 token context window. But starting today, a limited group of developers and enterprise customers can try it with a context window of up to 1 million tokens via AI Studio and Vertex AI in private preview.

As we roll out the full 1 million token context window, we’re actively working on optimizations to improve latency, reduce computational requirements and enhance the user experience. We’re excited for people to try this breakthrough capability, and we share more details on future availability below.

These continued advances in our next-generation models will open up new possibilities for people, developers and enterprises to create, discover and build using AI.

Context lengths of leading foundation models

Highly efficient architecture

Gemini 1.5 is built upon our leading research on Transformer and MoE architecture. While a traditional Transformer functions as one large neural network, MoE models are divided into smaller "expert” neural networks.

Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency. Google has been an early adopter and pioneer of the MoE technique for deep learning through research such as Sparsely-Gated MoE , GShard-Transformer , Switch-Transformer, M4 and more.

Our latest innovations in model architecture allow Gemini 1.5 to learn complex tasks more quickly and maintain quality, while being more efficient to train and serve. These efficiencies are helping our teams iterate, train and deliver more advanced versions of Gemini faster than ever before, and we’re working on further optimizations.

Greater context, more helpful capabilities

An AI model’s “context window” is made up of tokens, which are the building blocks used for processing information. Tokens can be entire parts or subsections of words, images, videos, audio or code. The bigger a model’s context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful.

Through a series of machine learning innovations, we’ve increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. We can now run up to 1 million tokens in production.

This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens.

Complex reasoning about vast amounts of information

1.5 Pro can seamlessly analyze, classify and summarize large amounts of content within a given prompt. For example, when given the 402-page transcripts from Apollo 11’s mission to the moon, it can reason about conversations, events and details found across the document.

Reasoning across a 402-page transcript: Gemini 1.5 Pro Demo

Gemini 1.5 Pro can understand, reason about and identify curious details in the 402-page transcripts from Apollo 11’s mission to the moon.

Better understanding and reasoning across modalities

1.5 Pro can perform highly-sophisticated understanding and reasoning tasks for different modalities, including video. For instance, when given a 44-minute silent Buster Keaton movie , the model can accurately analyze various plot points and events, and even reason about small details in the movie that could easily be missed.

Multimodal prompting with a 44-minute movie: Gemini 1.5 Pro Demo

Gemini 1.5 Pro can identify a scene in a 44-minute silent Buster Keaton movie when given a simple line drawing as reference material for a real-life object.

Relevant problem-solving with longer blocks of code

1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code. When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works.

Problem solving across 100,633 lines of code | Gemini 1.5 Pro Demo

Gemini 1.5 Pro can reason across 100,000 lines of code giving helpful solutions, modifications and explanations.

Enhanced performance

When tested on a comprehensive panel of text, code, image, audio and video evaluations, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing our large language models (LLMs). And when compared to 1.0 Ultra on the same benchmarks, it performs at a broadly similar level.

Gemini 1.5 Pro maintains high levels of performance even as its context window increases. In the Needle In A Haystack (NIAH) evaluation, where a small piece of text containing a particular fact or statement is purposely placed within a long block of text, 1.5 Pro found the embedded text 99% of the time, in blocks of data as long as 1 million tokens.

Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning. We tested this skill on the Machine Translation from One Book (MTOB) benchmark, which shows how well the model learns from information it’s never seen before. When given a grammar manual for Kalamang , a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person learning from the same content.

As 1.5 Pro’s long context window is the first of its kind among large-scale models, we’re continuously developing new evaluations and benchmarks for testing its novel capabilities.

For more details, see our Gemini 1.5 Pro technical report .

Extensive ethics and safety testing

In line with our AI Principles and robust safety policies, we’re ensuring our models undergo extensive ethics and safety tests. We then integrate these research learnings into our governance processes and model development and evaluations to continuously improve our AI systems.

Since introducing 1.0 Ultra in December, our teams have continued refining the model, making it safer for a wider release. We’ve also conducted novel research on safety risks and developed red-teaming techniques to test for a range of potential harms.

In advance of releasing 1.5 Pro, we've taken the same approach to responsible deployment as we did for our Gemini 1.0 models, conducting extensive evaluations across areas including content safety and representational harms, and will continue to expand this testing. Beyond this, we’re developing further tests that account for the novel long-context capabilities of 1.5 Pro.

Build and experiment with Gemini models

We’re committed to bringing each new generation of Gemini models to billions of people, developers and enterprises around the world responsibly.

Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI . Read more about this on our Google for Developers blog and Google Cloud blog .

We’ll introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model.

Early testers can try the 1 million token context window at no cost during the testing period, though they should expect longer latency times with this experimental feature. Significant improvements in speed are also on the horizon.

Developers interested in testing 1.5 Pro can sign up now in AI Studio, while enterprise customers can reach out to their Vertex AI account team.

Learn more about Gemini’s capabilities and see how it works .

Get more stories from Google in your inbox.

Your information will be used in accordance with Google's privacy policy.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a different email address .

Related stories

Gemini models are coming to performance max.

gemma-header

Gemma: Introducing new state-of-the-art open models

What is a long context window.

MSC_Keyword_Cover (3)

How AI can strengthen digital security

Shield

Working together to address AI risks and opportunities at MSC

AI Evergreen 1 (1)

How we’re partnering with the industry, governments and civil society to advance AI

Let’s stay in touch. Get the latest news from Google in your inbox.

PrepScholar

Choose Your Test

Sat / act prep online guides and tips, 113 great research paper topics.

author image

General Education

feature_pencilpaper

One of the hardest parts of writing a research paper can be just finding a good topic to write about. Fortunately we've done the hard work for you and have compiled a list of 113 interesting research paper topics. They've been organized into ten categories and cover a wide range of subjects so you can easily find the best topic for you.

In addition to the list of good research topics, we've included advice on what makes a good research paper topic and how you can use your topic to start writing a great paper.

What Makes a Good Research Paper Topic?

Not all research paper topics are created equal, and you want to make sure you choose a great topic before you start writing. Below are the three most important factors to consider to make sure you choose the best research paper topics.

#1: It's Something You're Interested In

A paper is always easier to write if you're interested in the topic, and you'll be more motivated to do in-depth research and write a paper that really covers the entire subject. Even if a certain research paper topic is getting a lot of buzz right now or other people seem interested in writing about it, don't feel tempted to make it your topic unless you genuinely have some sort of interest in it as well.

#2: There's Enough Information to Write a Paper

Even if you come up with the absolute best research paper topic and you're so excited to write about it, you won't be able to produce a good paper if there isn't enough research about the topic. This can happen for very specific or specialized topics, as well as topics that are too new to have enough research done on them at the moment. Easy research paper topics will always be topics with enough information to write a full-length paper.

Trying to write a research paper on a topic that doesn't have much research on it is incredibly hard, so before you decide on a topic, do a bit of preliminary searching and make sure you'll have all the information you need to write your paper.

#3: It Fits Your Teacher's Guidelines

Don't get so carried away looking at lists of research paper topics that you forget any requirements or restrictions your teacher may have put on research topic ideas. If you're writing a research paper on a health-related topic, deciding to write about the impact of rap on the music scene probably won't be allowed, but there may be some sort of leeway. For example, if you're really interested in current events but your teacher wants you to write a research paper on a history topic, you may be able to choose a topic that fits both categories, like exploring the relationship between the US and North Korea. No matter what, always get your research paper topic approved by your teacher first before you begin writing.

113 Good Research Paper Topics

Below are 113 good research topics to help you get you started on your paper. We've organized them into ten categories to make it easier to find the type of research paper topics you're looking for.

Arts/Culture

  • Discuss the main differences in art from the Italian Renaissance and the Northern Renaissance .
  • Analyze the impact a famous artist had on the world.
  • How is sexism portrayed in different types of media (music, film, video games, etc.)? Has the amount/type of sexism changed over the years?
  • How has the music of slaves brought over from Africa shaped modern American music?
  • How has rap music evolved in the past decade?
  • How has the portrayal of minorities in the media changed?

music-277279_640

Current Events

  • What have been the impacts of China's one child policy?
  • How have the goals of feminists changed over the decades?
  • How has the Trump presidency changed international relations?
  • Analyze the history of the relationship between the United States and North Korea.
  • What factors contributed to the current decline in the rate of unemployment?
  • What have been the impacts of states which have increased their minimum wage?
  • How do US immigration laws compare to immigration laws of other countries?
  • How have the US's immigration laws changed in the past few years/decades?
  • How has the Black Lives Matter movement affected discussions and view about racism in the US?
  • What impact has the Affordable Care Act had on healthcare in the US?
  • What factors contributed to the UK deciding to leave the EU (Brexit)?
  • What factors contributed to China becoming an economic power?
  • Discuss the history of Bitcoin or other cryptocurrencies  (some of which tokenize the S&P 500 Index on the blockchain) .
  • Do students in schools that eliminate grades do better in college and their careers?
  • Do students from wealthier backgrounds score higher on standardized tests?
  • Do students who receive free meals at school get higher grades compared to when they weren't receiving a free meal?
  • Do students who attend charter schools score higher on standardized tests than students in public schools?
  • Do students learn better in same-sex classrooms?
  • How does giving each student access to an iPad or laptop affect their studies?
  • What are the benefits and drawbacks of the Montessori Method ?
  • Do children who attend preschool do better in school later on?
  • What was the impact of the No Child Left Behind act?
  • How does the US education system compare to education systems in other countries?
  • What impact does mandatory physical education classes have on students' health?
  • Which methods are most effective at reducing bullying in schools?
  • Do homeschoolers who attend college do as well as students who attended traditional schools?
  • Does offering tenure increase or decrease quality of teaching?
  • How does college debt affect future life choices of students?
  • Should graduate students be able to form unions?

body_highschoolsc

  • What are different ways to lower gun-related deaths in the US?
  • How and why have divorce rates changed over time?
  • Is affirmative action still necessary in education and/or the workplace?
  • Should physician-assisted suicide be legal?
  • How has stem cell research impacted the medical field?
  • How can human trafficking be reduced in the United States/world?
  • Should people be able to donate organs in exchange for money?
  • Which types of juvenile punishment have proven most effective at preventing future crimes?
  • Has the increase in US airport security made passengers safer?
  • Analyze the immigration policies of certain countries and how they are similar and different from one another.
  • Several states have legalized recreational marijuana. What positive and negative impacts have they experienced as a result?
  • Do tariffs increase the number of domestic jobs?
  • Which prison reforms have proven most effective?
  • Should governments be able to censor certain information on the internet?
  • Which methods/programs have been most effective at reducing teen pregnancy?
  • What are the benefits and drawbacks of the Keto diet?
  • How effective are different exercise regimes for losing weight and maintaining weight loss?
  • How do the healthcare plans of various countries differ from each other?
  • What are the most effective ways to treat depression ?
  • What are the pros and cons of genetically modified foods?
  • Which methods are most effective for improving memory?
  • What can be done to lower healthcare costs in the US?
  • What factors contributed to the current opioid crisis?
  • Analyze the history and impact of the HIV/AIDS epidemic .
  • Are low-carbohydrate or low-fat diets more effective for weight loss?
  • How much exercise should the average adult be getting each week?
  • Which methods are most effective to get parents to vaccinate their children?
  • What are the pros and cons of clean needle programs?
  • How does stress affect the body?
  • Discuss the history of the conflict between Israel and the Palestinians.
  • What were the causes and effects of the Salem Witch Trials?
  • Who was responsible for the Iran-Contra situation?
  • How has New Orleans and the government's response to natural disasters changed since Hurricane Katrina?
  • What events led to the fall of the Roman Empire?
  • What were the impacts of British rule in India ?
  • Was the atomic bombing of Hiroshima and Nagasaki necessary?
  • What were the successes and failures of the women's suffrage movement in the United States?
  • What were the causes of the Civil War?
  • How did Abraham Lincoln's assassination impact the country and reconstruction after the Civil War?
  • Which factors contributed to the colonies winning the American Revolution?
  • What caused Hitler's rise to power?
  • Discuss how a specific invention impacted history.
  • What led to Cleopatra's fall as ruler of Egypt?
  • How has Japan changed and evolved over the centuries?
  • What were the causes of the Rwandan genocide ?

main_lincoln

  • Why did Martin Luther decide to split with the Catholic Church?
  • Analyze the history and impact of a well-known cult (Jonestown, Manson family, etc.)
  • How did the sexual abuse scandal impact how people view the Catholic Church?
  • How has the Catholic church's power changed over the past decades/centuries?
  • What are the causes behind the rise in atheism/ agnosticism in the United States?
  • What were the influences in Siddhartha's life resulted in him becoming the Buddha?
  • How has media portrayal of Islam/Muslims changed since September 11th?

Science/Environment

  • How has the earth's climate changed in the past few decades?
  • How has the use and elimination of DDT affected bird populations in the US?
  • Analyze how the number and severity of natural disasters have increased in the past few decades.
  • Analyze deforestation rates in a certain area or globally over a period of time.
  • How have past oil spills changed regulations and cleanup methods?
  • How has the Flint water crisis changed water regulation safety?
  • What are the pros and cons of fracking?
  • What impact has the Paris Climate Agreement had so far?
  • What have NASA's biggest successes and failures been?
  • How can we improve access to clean water around the world?
  • Does ecotourism actually have a positive impact on the environment?
  • Should the US rely on nuclear energy more?
  • What can be done to save amphibian species currently at risk of extinction?
  • What impact has climate change had on coral reefs?
  • How are black holes created?
  • Are teens who spend more time on social media more likely to suffer anxiety and/or depression?
  • How will the loss of net neutrality affect internet users?
  • Analyze the history and progress of self-driving vehicles.
  • How has the use of drones changed surveillance and warfare methods?
  • Has social media made people more or less connected?
  • What progress has currently been made with artificial intelligence ?
  • Do smartphones increase or decrease workplace productivity?
  • What are the most effective ways to use technology in the classroom?
  • How is Google search affecting our intelligence?
  • When is the best age for a child to begin owning a smartphone?
  • Has frequent texting reduced teen literacy rates?

body_iphone2

How to Write a Great Research Paper

Even great research paper topics won't give you a great research paper if you don't hone your topic before and during the writing process. Follow these three tips to turn good research paper topics into great papers.

#1: Figure Out Your Thesis Early

Before you start writing a single word of your paper, you first need to know what your thesis will be. Your thesis is a statement that explains what you intend to prove/show in your paper. Every sentence in your research paper will relate back to your thesis, so you don't want to start writing without it!

As some examples, if you're writing a research paper on if students learn better in same-sex classrooms, your thesis might be "Research has shown that elementary-age students in same-sex classrooms score higher on standardized tests and report feeling more comfortable in the classroom."

If you're writing a paper on the causes of the Civil War, your thesis might be "While the dispute between the North and South over slavery is the most well-known cause of the Civil War, other key causes include differences in the economies of the North and South, states' rights, and territorial expansion."

#2: Back Every Statement Up With Research

Remember, this is a research paper you're writing, so you'll need to use lots of research to make your points. Every statement you give must be backed up with research, properly cited the way your teacher requested. You're allowed to include opinions of your own, but they must also be supported by the research you give.

#3: Do Your Research Before You Begin Writing

You don't want to start writing your research paper and then learn that there isn't enough research to back up the points you're making, or, even worse, that the research contradicts the points you're trying to make!

Get most of your research on your good research topics done before you begin writing. Then use the research you've collected to create a rough outline of what your paper will cover and the key points you're going to make. This will help keep your paper clear and organized, and it'll ensure you have enough research to produce a strong paper.

What's Next?

Are you also learning about dynamic equilibrium in your science class? We break this sometimes tricky concept down so it's easy to understand in our complete guide to dynamic equilibrium .

Thinking about becoming a nurse practitioner? Nurse practitioners have one of the fastest growing careers in the country, and we have all the information you need to know about what to expect from nurse practitioner school .

Want to know the fastest and easiest ways to convert between Fahrenheit and Celsius? We've got you covered! Check out our guide to the best ways to convert Celsius to Fahrenheit (or vice versa).

Need more help with this topic? Check out Tutorbase!

Our vetted tutor database includes a range of experienced educators who can help you polish an essay for English or explain how derivatives work for Calculus. You can use dozens of filters and search criteria to find the perfect person for your needs.

Connect With a Tutor Now

These recommendations are based solely on our knowledge and experience. If you purchase an item through one of our links, PrepScholar may receive a commission.

author image

Christine graduated from Michigan State University with degrees in Environmental Biology and Geography and received her Master's from Duke University. In high school she scored in the 99th percentile on the SAT and was named a National Merit Finalist. She has taught English and biology in several countries.

Student and Parent Forum

Our new student and parent forum, at ExpertHub.PrepScholar.com , allow you to interact with your peers and the PrepScholar staff. See how other students and parents are navigating high school, college, and the college admissions process. Ask questions; get answers.

Join the Conversation

Ask a Question Below

Have any questions about this article or other topics? Ask below and we'll reply!

Improve With Our Famous Guides

  • For All Students

The 5 Strategies You Must Be Using to Improve 160+ SAT Points

How to Get a Perfect 1600, by a Perfect Scorer

Series: How to Get 800 on Each SAT Section:

Score 800 on SAT Math

Score 800 on SAT Reading

Score 800 on SAT Writing

Series: How to Get to 600 on Each SAT Section:

Score 600 on SAT Math

Score 600 on SAT Reading

Score 600 on SAT Writing

Free Complete Official SAT Practice Tests

What SAT Target Score Should You Be Aiming For?

15 Strategies to Improve Your SAT Essay

The 5 Strategies You Must Be Using to Improve 4+ ACT Points

How to Get a Perfect 36 ACT, by a Perfect Scorer

Series: How to Get 36 on Each ACT Section:

36 on ACT English

36 on ACT Math

36 on ACT Reading

36 on ACT Science

Series: How to Get to 24 on Each ACT Section:

24 on ACT English

24 on ACT Math

24 on ACT Reading

24 on ACT Science

What ACT target score should you be aiming for?

ACT Vocabulary You Must Know

ACT Writing: 15 Tips to Raise Your Essay Score

How to Get Into Harvard and the Ivy League

How to Get a Perfect 4.0 GPA

How to Write an Amazing College Essay

What Exactly Are Colleges Looking For?

Is the ACT easier than the SAT? A Comprehensive Guide

Should you retake your SAT or ACT?

When should you take the SAT or ACT?

Stay Informed

best research paper database

Get the latest articles and test prep tips!

Looking for Graduate School Test Prep?

Check out our top-rated graduate blogs here:

GRE Online Prep Blog

GMAT Online Prep Blog

TOEFL Online Prep Blog

Holly R. "I am absolutely overjoyed and cannot thank you enough for helping me!”

We've detected unusual activity from your computer network

To continue, please click the box below to let us know you're not a robot.

Why did this happen?

Please make sure your browser supports JavaScript and cookies and that you are not blocking them from loading. For more information you can review our Terms of Service and Cookie Policy .

For inquiries related to this message please contact our support team and provide the reference ID below.

Research by Topic

National Archives Logo

Records Related to Unidentified Anomalous Phenomena (UAPs) at the National Archives

The National Archives and Records Administration (NARA) has established an ‘‘Unidentified Anomalous Phenomena Records Collection," per sections 1841–1843 of the 2024 National Defense Authorization Act (Public Law 118-31) .  

Please explore the links below to find out more about records related to unidentified anomalous phenomena (UAPs)/unidentified flying objects (UFOs) in NARA’s holdings. All links to items in the National Archives Catalog are downloadable and can be republished with attribution to the National Archives and Records Administration.

best research paper database

Still Pictures and Photographs UAP Related Records

RG 255: Records of the National Aeronautics and Space Administration

  • Items from the series “Photographs Relating to Agency Activities, Facilities and Personnel, 1960–1991” (National Archives Identifier: 5956182 , Local Identifier: 255-GS)

RG 342: Records of U.S. Air Force Commands, Activities, and Organizations, 1900–2003

  • Items include 342-AF-63708AC, 342-AF-163969AC, 342-AF-34920AC, 342-AF-34923 AC, 342-AF-34919AC, 342-AF-163969AC, and 342-AF-34919AC.  A finding aid for these items is available in the Still Picture Research Room.
  • Items from the series “Black and White and Color Photographs of U.S. Air Force Activities, Facilities, and Personnel, Domestic and Foreign” (National Archives Identifier: 542326 , Local Identifier: 342-B)

RG 341: Records of Headquarters U.S. Air Force (Air Staff)

  • “Project “BLUE BOOK”, 1954–1966.” (National Archives Identifier: 542184 , Local Identifier: 341-PBB)

Moving Images and Sound UAP Related Records

RG 111: Records of the Office of the Chief Signal Officer

  • MAJ. GEN. JOHN A. SAMFORD'S STATEMENT ON "FLYING SAUCERS", PENTAGON, WASHINGTON, D.C (National Archives Identifier: 25738 , Local Identifier: 111-LC-30875)

RG 255: Records of the National Aeronautics and Space Administration, 1903–2006

  • Walter Cronkite and Gordon Cooper on UFOs (National Archives Identifier: 86027191 , Local Identifier: 255-PAOa-807-AAE).
  • An Executive Summary of the Greatest Secret of the 20th Century (National Archives Identifier: 5833930 , Local Identifier: 255-GOLDIN-233).  

RG 263: Records of the Central Intelligence Agency, 1894–2002

  • Unidentified Flying Objects, 1956 (National Archives Identifier: 617148 , Local Identifier: 263-95). This film is edited, with sound. 
  • Unidentified Flying Objects, 1956 (National Archives Identifier: 5954651 and 617916 , Local Identifier: 263-124). 

RG 306: Records of the U.S. Information Agency, 1900–2003

  • Doctor Edward Condon, University of Colorado Physicist Studying Unidentified Flying Objects (National Archives Identifier: 127614 , Local Identifier: 306-EN-S-T-2808). 
  • Alderman Interview with Doctor Page on Unidentified Flying Objects (National Archives Identifier: 130003 , Local Identifier: 306-EN-W-T-8990)
  • Foreign Press Center Briefing with B. Maccabee, L. Koss, J. Shandera, and B. Hopkins (National Archives Identifier: 56103 , Local Identifier: 306-FP-17)

RG 330: Records of the Office of the Secretary of Defense

  • The Case of the Flying Saucer (National Archives Identifier: 2386432 , Local Identifier: 330a.85)
  • Unidentified Flying Object (UFO) Sighting (National Archives Identifier: 614788 , Local Identifier: 330-DVIC-653)

RG 341: Records of Headquarters U.S. Air Force (Air Staff) 

  • “Project Blue Book Motion Picture Films, 1950-1966” (National Archives Identifier: 61934 , Local Identifier: 341-PBB)
  • “Sound Recordings Relating to Project Blue Book Unidentified Flying Object (UFO) Investigations, 1953-1967” (National Archives Identifier: 1142703 , Local Identifier: 341-PBBa)
  • “Moving Images Relating to “The Roswell Reports” Source Data Research Files, 1946-1996” (National Archives Identifier: 566658 , Local Identifier: 341-ROSWELL)
  • “Sound Recordings Relating to “The Roswell Reports”, 1991-1996” (National Archives Identifier: 566843 , Local Identifier: 341-ROSWELLa)

RG 342: Records of U.S. Air Force Commands, Activities, and Organizations

  • DFD Avrocar I Progress Report, February 1, 1958 – May 1959 (National Archives Identifier: 68170 , Local Identifier: 342-USAF-29668).
  • Disc Flight Development, Avrocar I Progress Report, May 2, 1959–April 12, 1960 (National Archives Identifier: 68175 , Local Identifier: 342-USAF-29673). 
  • Avrocar Continuation Test Program and Terrain Test Program, June 1, 1960–June 14, 1961 (National Archives Identifier: 68405 , Local Identifier: 342-USAF-31135). 
  • Friend, Foe, or Fantasy, 1966 (National Archives Identifier: 69861 , Local Identifier: 342-USAF-41040). 
  • UFO Interview, 1966 (National Archives Identifier: 70511 , Local Identifier: 342-USAF-42990).
  • USAF UFO sightings, California, 1952–1975 (National Archives Identifier: 72035 , Local Identifier: 342-USAF-49377).

RG 517: Records of the U.S. Agency for Global Media

  • UFO Sighting Over Alaska, January 13, 1987 (National Archives Identifier: 262327376,   Local Identifier: 517-VOAa-87-306.)
  • Science World 1030, 2002 (National Archives Identifier: 77179268 , Local Identifier: 517-BBG-50046)

Donated Collections:

  • Unidentified Flying Objects (UFOs): Fact or Fiction, November 1974 (National Archives Identifier: 2838871 , Local Identifier: 200.1572)
  • Paramount News [Mar. 7] (1951) Vol. 10. No. 52 (National Archives Identifier: 99581 ,  Local Identifier: PARA-PN-10.52)
  • Paramount News [July 30] (1952) Vol. 11, No. 100 (National Archives Identifier: 99731 , Local Identifier: PARA-PN-11.100)
  • Universal Newsreel Volume 22, Release 276, August 22, 1949 (National Archives Identifier: 234273290 , Local Identifier: UN-UN-22-276)
  • Universal Newsreel Volume 25, Release 586, August 11, 1952 (National Archives Identifier: 234273597 , Local Identifier: UN-UN-25-586)

Textual Records and Microfilm UAP Related Records

RG 64: Records of the National Archives and Records Administration  

  • Project Blue Book: UFO Sightings  (National Archives Identifier: 40027753 )

RG 181: Records of Navy Installations Command, Navy Regions, Naval Districts, and Shore Establishments

  • Collection of A8-2 Information, 1959 (National Archives Identifier: 291645977 )

RG 237: Records of The Federal Aviation Administration

  • Information Releases Relating to Unidentified Flying Object, 1986 (FAA—Japan Airlines Flight 1628) (National Archives Identifier: 733667 )
  • Gemini VII Air-to-Ground Transcript Volume I (National Archives Identifier: 5011500 )
  • Records of Investigations of Unidentified Flying Objects (UFOs) Relating to the Office of Special Investigations, 1948–1968 (National Archives Identifier: 45484701 )
  • Project Blue Book Administrative Files, 1947–1969 (National Archives Identifier: 595175 )
  • Copies of the Case Files of the 4602D Air Intelligence Service Squadron on Sightings of Unidentified Flying Objects (UFOs), 1954–1956 (National Archives Identifier: 23857158 )
  • Case Files of the 4602 D Intelligence Service Squadron on Sightings of Unidentified Flying Objects (UFOs) (National Archives Identifier: 23857157 )
  • Roswell Report Source Files, 1987–1996 (National Archives Identifier: 17618564 )
  • Air Intelligence Reports, 1948–1953 (National Archives Identifier: 23857122 )
  • Project Blue Book Artifacts, 1952–1969 (National Archives Identifier: 23857160 )
  • Sanitized Version of Project Blue Book Case Files on Sightings of Unidentified Flying Objects, 1947–1969 (National Archives Identifier: 597821 )
  • Case files on Sightings of Unidentified Flying Objects (UFOs), 1953-1960 (National Archives Identifier: 23857159 )
  • Project Blue Book Case Files on Sightings of Unidentified Flying Objects (UFOs), June 1947–December 1969 (National Archives Identifier: 595466 )
  • Miscellaneous Case Files On Sightings Of Unidentified Flying Objects (UFOs), 1953–1960 (National Archives Identifier: 23857159 )

RG 342: Records of the U. S. Air Force Commands, Activities, and Organizations  

  • AFR 80-17/OCAMA-TAFB Sup Unidentified Flying Objects (UFO) (National Archives Identifier: 37294296 )
  • Obsolete During 1969: 4600 Air Base Wing Supplement 1 to Air Force Regulation 80-17, Unidentified Flying Objects (UFO), 10 January 1967; Superseded, 15 April 1969 (National Archives Identifier: 68875395 )
  • REL-2-4-1 UFOs 1965 (National Archives Identifier: 311003081 )
  • File 5: 2, Community Relations, 1970 (National Archives Identifier: 47323287 )
  • 471.6 Guided Missiles, 1 January 1952 (National Archives Identifier: 333334712 )
  • 471.6 Guided Missiles, 1 July 1952 (National Archives Identifier: 333334717 )

National Archives Blog Posts and Articles

  • Project BLUE BOOK - Unidentified Flying Objects (Updated 2020)
  • National Archives News: Public Interest in UFOs Persists 50 Years After Project Blue Book Termination (2019)
  • Featured Document Display: 50 Years Ago: Government Stops Investigating UFOs (2019)
  • Pieces of History: Saucers Over Washington: the History of Project Blue Book (2019)
  • Pieces of History: INVASION! (of privacy) (2018)
  • Pieces of History: UFOs: Natural Explanations (2018)
  • Pieces of History: UFOs: Man-Made, Made Up, and Unknown (2018)
  • National Archives News: Do Records Show Proof of UFOs? (2018)
  • The Unwritten Record: The Roswell Reports: What crashed in the desert? (2014)
  • The Unwritten Record: Avrocar: The U.S. Military’s Flying Saucer (2014)
  • The NDC Blog: What on Earth Is It? (2014)
  • Pieces of History: Flying Saucers, Popular Mechanics, and the National Archives (2013)
  • The Unwritten Record: Project Blue Book: Home Movies in UFO Reports (2013)
  • The Unwritten Record: Project Blue Book: Spotting UFOs in the Film Record (2013)
  • [VIDEO]: UFO Project Blue Book at National Archives Museum

best research paper database

Reference management. Clean and simple.

The top list of computer science research databases

The best research databases for computer science

1. ACM Digital Library

2. ieee xplore digital library, 3. dblp computer science bibliography, 4. springer lecture notes in computer science (lncs), frequently asked questions about computer science research databases, related articles.

Besides the interdisciplinary research databases Web of Science and Scopus there are also academic databases specifically dedicated to computer science. We have compiled a list of the top 4 research databases with a special focus on computer science to help you find research papers, scholarly articles, and conference papers fast.

ACM Digital Library is the clear number one when it comes to academic databases for computer science. The ACM Full-Text Collection currently has 540,000+ articles, while the ACM Guide to Computing Literature holds more than 2.8+ million bibliographic entries.

  • Coverage: 2.8+ million articles
  • Abstracts: ✔
  • Related articles: ✘
  • References: ✔
  • Cited by: ✔
  • Full text: ✔ (requires institutional subscription)
  • Export formats: BibTeX, EndNote

Search interface of the ACM Digital Library

Pro tip: Use a reference manager like Paperpile to keep track of all your sources. Paperpile integrates with ACM Digital Library and many popular databases, so you can save references and PDFs directly to your library using the Paperpile buttons and later cite them in thousands of citation styles:

best research paper database

IEEE Xplore holds more than 4.7 million research articles from the fields of electrical engineering, computer science, and electronics. It not only covers articles published in scholarly journals, but also conference papers, technical standards, as well as some books.

  • Coverage: 4.7+ million articles
  • Export formats: BibTeX, RIS

Search interface of IEEE Xplore

Hosted at the University of Trier, Germany, dbpl has become an indispensable resource in the field of computer science. Its index covers journal articles, conference and workshop proceedings, as well as monographs.

  • Coverage: 4.3 million articles
  • Abstracts: ✘
  • References: ✘
  • Cited by: ✘
  • Full text: ✘ (Links to publisher websites available)
  • Export formats: RIS, BibTeX

Search interface of dbpl

Springer's Lecture Notes in Computer Science is the number one publishing source for conference proceedings covering all areas of computer science.

  • Coverage: 415,000+ articles
  • Export formats: RIS, EndNote, BibTeX

Search interface of Springer Lecture Notes in Computer Science

Hosted at the University of Trier, Germany, dbpl has become an indispensable resource in the field of computer science. It's index covers journal articles, conference and workshop proceedings, as well as monographs.

Microsoft Academic was a free academic search engine developed by Microsoft Research. It had more than 13.9 million articles indexed. It was shut down in 2022.

EEE Xplore holds more than 4.7 million research articles from the fields of electrical engineering, computer science, and electronics. It not only covers articles published in scholarly journals, but also conference papers, technical standards, as well as some books.

Content analysis illustration

  • Mobile Site
  • Staff Directory
  • Advertise with Ars

Filter by topic

  • Biz & IT
  • Gaming & Culture

Front page layout

Pics and it didn't happen —

Openai collapses media reality with sora, a photorealistic ai video generator, hello, cultural singularity—soon, every video you see online could be completely fake..

Benj Edwards - Feb 16, 2024 5:23 pm UTC

Snapshots from three videos generated using OpenAI's Sora.

On Thursday, OpenAI announced Sora , a text-to-video AI model that can generate 60-second-long photorealistic HD video from written descriptions. While it's only a research preview that we have not tested, it reportedly creates synthetic video (but not audio yet) at a fidelity and consistency greater than any text-to-video model available at the moment. It's also freaking people out.

Further Reading

"It was nice knowing you all. Please tell your grandchildren about my videos and the lengths we went to to actually record them," wrote Wall Street Journal tech reporter Joanna Stern on X.

"This could be the 'holy shit' moment of AI," wrote Tom Warren of The Verge.

"Every single one of these videos is AI-generated, and if this doesn't concern you at least a little bit, nothing will," tweeted YouTube tech journalist Marques Brownlee.

For future reference—since this type of panic will some day appear ridiculous—there's a generation of people who grew up believing that photorealistic video must be created by cameras. When video was faked (say, for Hollywood films), it took a lot of time, money, and effort to do so, and the results weren't perfect. That gave people a baseline level of comfort that what they were seeing remotely was likely to be true, or at least representative of some kind of underlying truth. Even when the kid jumped over the lava , there was at least a kid and a room.

The prompt that generated the video above: " A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors. "

Technology like Sora pulls the rug out from under that kind of media frame of reference. Very soon, every photorealistic video you see online could be 100 percent false in every way. Moreover, every historical video you see could also be false. How we confront that as a society and work around it while maintaining trust in remote communications is far beyond the scope of this article, but I tried my hand at offering some solutions  back in 2020, when all of the tech we're seeing now seemed like a distant fantasy to most people.

In that piece, I called the moment that truth and fiction in media become indistinguishable the "cultural singularity." It appears that OpenAI is on track to bring that prediction to pass a bit sooner than we expected.

Prompt: Reflections in the window of a train traveling through the Tokyo suburbs.

OpenAI has found that, like other AI models that use the transformer architecture, Sora scales with available compute . Given far more powerful computers behind the scenes, AI video fidelity could improve considerably over time. In other words, this is the "worst" AI-generated video is ever going to look. There's no synchronized sound yet, but that might be solved in future models.

How (we think) they pulled it off

AI video synthesis has progressed by leaps and bounds over the past two years. We first covered text-to-video models in September 2022 with Meta's Make-A-Video . A month later, Google showed off Imagen Video . And just 11 months ago, an AI-generated version of Will Smith eating spaghetti went viral. In May of last year, what was previously considered to be the front-runner in the text-to-video space, Runway Gen-2, helped craft a fake beer commercial full of twisted monstrosities, generated in two-second increments. In earlier video-generation models, people pop in and out of reality with ease, limbs flow together like pasta, and physics doesn't seem to matter.

Sora (which means "sky" in Japanese) appears to be something altogether different. It's high-resolution (1920x1080), can generate video with temporal consistency (maintaining the same subject over time) that lasts up to 60 seconds, and appears to follow text prompts with a great deal of fidelity. So, how did OpenAI pull it off?

OpenAI doesn't usually share insider technical details with the press, so we're left to speculate based on theories from experts and information given to the public.

OpenAI says that Sora is a diffusion model, much like DALL-E 3 and Stable Diffusion . It generates a video by starting off with noise and "gradually transforms it by removing the noise over many steps," the company explains. It "recognizes" objects and concepts listed in the written prompt and pulls them out of the noise, so to speak, until a coherent series of video frames emerge.

Sora is capable of generating videos all at once from a text prompt, extending existing videos, or generating videos from still images. It achieves temporal consistency by giving the model "foresight" of many frames at once, as OpenAI calls it, solving the problem of ensuring a generated subject remains the same even if it falls out of view temporarily.

OpenAI represents video as collections of smaller groups of data called "patches," which the company says are similar to tokens (fragments of a word) in GPT-4. "By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions, and aspect ratios," the company writes.

An important tool in OpenAI's bag of tricks is that its use of AI models is compounding . Earlier models are helping to create more complex ones. Sora follows prompts well because, like DALL-E 3 , it utilizes synthetic captions that describe scenes in the training data generated by another AI model like GPT-4V . And the company is not stopping here. "Sora serves as a foundation for models that can understand and simulate the real world," OpenAI writes, "a capability we believe will be an important milestone for achieving AGI."

One question on many people's minds is what data OpenAI used to train Sora. OpenAI has not revealed its dataset, but based on what people are seeing in the results, it's possible OpenAI is using synthetic video data generated in a video game engine in addition to sources of real video (say, scraped from YouTube or licensed from stock video libraries). Nvidia's Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, "I won't be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!" Until confirmed by OpenAI, however, that's just speculation.

reader comments

Channel ars technica.

IMAGES

  1. The best academic research databases [Update 2024]

    best research paper database

  2. The best academic research databases [Update 2024]

    best research paper database

  3. The best academic research databases [Update 2024]

    best research paper database

  4. The best academic research databases [Update 2024]

    best research paper database

  5. (PDF) 101 Free Online Journal and Research Databases for Academics

    best research paper database

  6. Creating Your Research Paper Database

    best research paper database

VIDEO

  1. Introduction to Temporal Databases

  2. Binary Conversion

  3. BEST RESEARCH PAPER

  4. Ph.D

  5. Free database to search articles for your thesis

  6. Check out the details of my research paper 😇

COMMENTS

  1. The best academic research databases [Update 2024]

    The best academic research databases [Update 2024] - Paperpile Guides Research Academic Databases The top list of academic research databases Content: 1. Scopus 2. Web of Science 3. PubMed 4. ERIC 5. IEEE Xplore 6. ScienceDirect 7. Directory of Open Access Journals (DOAJ) 8. JSTOR Frequently Asked Questions about academic research databases

  2. List of academic databases and search engines

    This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific and other articles.

  3. JSTOR Home

    Harness the power of visual materials—explore more than 3 million images now on JSTOR. Enhance your scholarly research with underground newspapers, magazines, and journals. Explore collections in the arts, sciences, and literature from the world's leading museums, archives, and scholars. JSTOR is a digital library of academic journals ...

  4. 21 Legit Research Databases for Free Journal Articles in 2022

    Below is a list of some of the major databases you can use to find peer-reviewed articles and other sources in disciplines spanning the humanities, sciences, and social sciences. What Are Open Access Journals? An open access (OA) journal is a journal whose content can be accessed without payment.

  5. 10 Free Research and Journal Databases

    7. Social Science Research Network (SSRN) SSRN is a database for research from the social sciences and humanities, including 846,589 research papers from 426,107 researchers across 30 disciplines. Most of these are available for free, although you may need to sign up as a member (also free) to access some services.

  6. Home

    PubMed Central ® (PMC) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM)

  7. How to Find Sources

    Research databases. You can search for scholarly sources online using databases and search engines like Google Scholar. These provide a range of search functions that can help you to find the most relevant sources. If you are searching for a specific article or book, include the title or the author's name. Alternatively, if you're just ...

  8. Search NCBI databases

    PubMed PubMed® comprises more than 36 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites. Featured Bookshelf titles Characteristics of Existing Asthma Self-Management Education Packages

  9. 23 Research Databases for Professional and Academic Use

    1. Scopus Scopus is a database that features literature for a large variety of disciplines. It offers some services for free, but full access to the database requires a subscription. A unique feature of Scopus is that it also ranks journals and authors by their h-index, which tracks how many users cite the specific resource.

  10. Database Search

    A catalog to find the specialized search engine that has what you need—identifying and connecting to the best databases for your research topic. What is Database Search? Harvard Library licenses hundreds of online databases, giving you access to academic and news articles, books, journals, primary sources, streaming media, and much more.

  11. Scopus

    Scopus: Comprehensive, multidisciplinary, trusted abstract and citation database. Quickly find relevant and authoritative research, identify experts and gain access to reliable data, metrics and analytical tools. Be confident in advancing research, educational goals, and research direction and priorities — all from one database.

  12. Search

    Find the research you need | With 160+ million publications, 1+ million questions, and 25+ million researchers, this is where everyone can access science

  13. 10 Best Online Websites and Resources for Academic Research

    Accessing Resources 10 Best Online Websites and Resources for Academic Research Looking for reliable academic sources online can be a challenge. These online resources will help you find credible sources for your academic research. By Genevieve Carlton, Ph.D. Edited by Tyler Epps Updated on March 21, 2023 Learn more about our editorial process

  14. Free Research Databases from EBSCO

    GreenFILE is a free research database covering all aspects of human impact to the environment. Its collection of scholarly, government and general-interest titles includes content on global warming, green building, pollution, sustainable agriculture, renewable energy, recycling, and more. Access now

  15. Academia.edu

    Academia.edu is a place to share and follow research. Accelerate Your Research Streamline your discovery of relevant research. Get access to 47+ million research papers and stay informed with important topics through courses.

  16. Journal Top 100

    Kendy Tzu-yun Teng Dave C. Brodbelt Dan G. O'Neill Article Open Access 28 Apr 2022 Scientific Reports Bioarchaeological and palaeogenomic portrait of two Pompeians that died during the eruption of...

  17. Welcome

    Top Ten Databases. A great database to get started with for your research on any topic. Use it to search for articles from scholarly (peer-reviewed) journals, newspapers, and magazines. Contains full-text content and peer-reviewed business journals covering all disciplines of business, including marketing, management, accounting, banking, and ...

  18. Genomic data in the All of Us Research Program

    A study describes the release of clinical-grade whole-genome sequence data for 245,388 diverse participants by the All of Us Research Program and characterizes the properties of the dataset.

  19. Introducing Gemini 1.5, Google's next-generation AI model

    Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in our approach, building upon research and engineering innovations across nearly every part of our foundation model development and infrastructure. This includes making Gemini 1.5 more efficient to train and serve, with a new Mixture-of-Experts (MoE) architecture.

  20. 113 Great Research Paper Topics

    113 Great Research Paper Topics. One of the hardest parts of writing a research paper can be just finding a good topic to write about. Fortunately we've done the hard work for you and have compiled a list of 113 interesting research paper topics. They've been organized into ten categories and cover a wide range of subjects so you can easily ...

  21. The top list of research databases for medicine and healthcare

    The best research databases for healthcare and medicine [Update 2024] - Paperpile Guides Research Academic Databases The top list of research databases for medicine and healthcare Content: 1. PubMed 2. EMBASE 3. Cochrane Library 4. PubMed Central (PMC) 5. UpToDate Frequently Asked Questions about research databases for medicine and healthcare

  22. Largest Covid Vaccine Study Yet Finds Links to Health Conditions

    Vaccines that protect against severe illness, death and lingering long Covid symptoms from a coronavirus infection were linked to small increases in neurological, blood, and heart-related ...

  23. Records Related to Unidentified Anomalous Phenomena (UAPs) at the

    The National Archives and Records Administration (NARA) has established an ''Unidentified Anomalous Phenomena Records Collection," per sections 1841-1843 of the 2024 National Defense Authorization Act (Public Law 118-31). Please explore the links below to find out more about records related to unidentified anomalous phenomena (UAPs)/unidentified flying objects (UFOs) in NARA's holdings ...

  24. The best research databases for computer science [Update ...

    Get 30 days free. 1. ACM Digital Library. ACM Digital Library is the clear number one when it comes to academic databases for computer science. The ACM Full-Text Collection currently has 540,000+ articles, while the ACM Guide to Computing Literature holds more than 2.8+ million bibliographic entries. Coverage: 2.8+ million articles. Abstracts: .

  25. OpenAI collapses media reality with Sora, a photorealistic AI video

    Nvidia's Dr. Jim Fan, who is a specialist in training AI with synthetic data, wrote on X, "I won't be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!"