What is a research repository, and why do you need one?

Last updated

31 January 2024

Reviewed by

Miroslav Damyanov

Without one organized source of truth, research can be left in silos, making it incomplete, redundant, and useless when it comes to gaining actionable insights.

A research repository can act as one cohesive place where teams can collate research in meaningful ways. This helps streamline the research process and ensures the insights gathered make a real difference.

  • What is a research repository?

A research repository acts as a centralized database where information is gathered, stored, analyzed, and archived in one organized space.

In this single source of truth, raw data, documents, reports, observations, and insights can be viewed, managed, and analyzed. This allows teams to organize raw data into themes, gather actionable insights, and share those insights with key stakeholders.

Ultimately, the research repository can make the research you gain much more valuable to the wider organization.

  • Why do you need a research repository?

Information gathered through the research process can be disparate, challenging to organize, and difficult to obtain actionable insights from.

Some of the most common challenges researchers face include the following:

Information being collected in silos

No single source of truth

Research being conducted multiple times unnecessarily

No seamless way to share research with the wider team

Reports get lost and go unread

Without a way to store information effectively, it can become disparate and inconclusive, lacking utility. This can lead to research being completed by different teams without new insights being gathered.

A research repository can streamline the information gathered to address those key issues, improve processes, and boost efficiency. Among other things, an effective research repository can:

Optimize processes: it can ensure the process of storing, searching, and sharing information is streamlined and optimized across teams.

Minimize redundant research: when all information is stored in one accessible place for all relevant team members, the chances of research being repeated are significantly reduced. 

Boost insights: having one source of truth boosts the chances of being able to properly analyze all the research that has been conducted and draw actionable insights from it.

Provide comprehensive data: there’s less risk of gaps in the data when it can be easily viewed and understood. The overall research is also likely to be more comprehensive.

Increase collaboration: given that information can be more easily shared and understood, there’s a higher likelihood of better collaboration and positive actions across the business.

  • What to include in a research repository

Including the right things in your research repository from the start can help ensure that it provides maximum benefit for your team.

Here are some of the things that should be included in a research repository:

An overall structure

There are many ways to organize the data you collect. To organize it in a way that’s valuable for your organization, you’ll need an overall structure that aligns with your goals.

You might wish to organize projects by research type, project, department, or when the research was completed. This will help you better understand the research you’re looking at and find it quickly.

Including information about the research—such as authors, titles, keywords, a description, and dates—can make searching through raw data much faster and make the organization process more efficient.

All key data and information

It’s essential to include all of the key data you’ve gathered in the repository, including supplementary materials. This prevents information gaps, and stakeholders can easily stay informed. You’ll need to include the following information, if relevant:

Research and journey maps

Tools and templates (such as discussion guides, email invitations, consent forms, and participant tracking)

Raw data and artifacts (such as videos, CSV files, and transcripts)

Research findings and insights in various formats (including reports, desks, maps, images, and tables)

Version control

It’s important to use a system that has version control. This ensures the changes (including updates and edits) made by various team members can be viewed and reversed if needed.

  • What makes a good research repository?

The following key elements make up a good research repository that’s useful for your team:

Access: all key stakeholders should be able to access the repository to ensure there’s an effective flow of information.

Actionable insights: a well-organized research repository should help you get from raw data to actionable insights faster.

Effective searchability : searching through large amounts of research can be very time-consuming. To save time, maximize search and discoverability by clearly labeling and indexing information.

Accuracy: the research in the repository must be accurately completed and organized so that it can be acted on with confidence.

Security: when dealing with data, it’s also important to consider security regulations. For example, any personally identifiable information (PII) must be protected. Depending on the information you gather, you may need password protection, encryption, and access control so that only those who need to read the information can access it.

  • How to create a research repository

Getting started with a research repository doesn’t have to be convoluted or complicated. Taking time at the beginning to set up the repository in an organized way can help keep processes simple further down the line.

The following six steps should simplify the process:

1. Define your goals

Before diving in, consider your organization’s goals. All research should align with these business goals, and they can help inform the repository.

As an example, your goal may be to deeply understand your customers and provide a better customer experience. Setting out this goal will help you decide what information should be collated into your research repository and how it should be organized for maximum benefit.

2. Choose a platform

When choosing a platform, consider the following:

Will it offer a single source of truth?

Is it simple to use

Is it relevant to your project?

Does it align with your business’s goals?

3. Choose an organizational method

To ensure you’ll be able to easily search for the documents, studies, and data you need, choose an organizational method that will speed up this process.

Choosing whether to organize your data by project, date, research type, or customer segment will make a big difference later on.

4. Upload all materials

Once you have chosen the platform and organization method, it’s time to upload all the research materials you have gathered. This also means including supplementary materials and any other information that will provide a clear picture of your customers.

Keep in mind that the repository is a single source of truth. All materials that relate to the project at hand should be included.

5. Tag or label materials

Adding metadata to your materials will help ensure you can easily search for the information you need. While this process can take time (and can be tempting to skip), it will pay off in the long run.

The right labeling will help all team members access the materials they need. It will also prevent redundant research, which wastes valuable time and money.

6. Share insights

For research to be impactful, you’ll need to gather actionable insights. It’s simpler to spot trends, see themes, and recognize patterns when using a repository. These insights can be shared with key stakeholders for data-driven decision-making and positive action within the organization.

  • Different types of research repositories

There are many different types of research repositories used across organizations. Here are some of them:

Data repositories: these are used to store large datasets to help organizations deeply understand their customers and other information.

Project repositories: data and information related to a specific project may be stored in a project-specific repository. This can help users understand what is and isn’t related to a project.

Government repositories: research funded by governments or public resources may be stored in government repositories. This data is often publicly available to promote transparent information sharing.

Thesis repositories: academic repositories can store information relevant to theses. This allows the information to be made available to the general public.

Institutional repositories: some organizations and institutions, such as universities, hospitals, and other companies, have repositories to store all relevant information related to the organization.

  • Build your research repository in Dovetail

With Dovetail, building an insights hub is simple. It functions as a single source of truth where research can be gathered, stored, and analyzed in a streamlined way.

1. Get started with Dovetail

Dovetail is a scalable platform that helps your team easily share the insights you gather for positive actions across the business.

2. Assign a project lead

It’s helpful to have a clear project lead to create the repository. This makes it clear who is responsible and avoids duplication.

3. Create a project

To keep track of data, simply create a project. This is where you’ll upload all the necessary information.

You can create projects based on customer segments, specific products, research methods, or when the research was conducted. The project breakdown will relate back to your overall goals and mission.

4. Upload data and information

Now, you’ll need to upload all of the necessary materials. These might include data from customer interviews, sales calls, product feedback, usability testing, and more. You can also upload supplementary information.

5. Create a taxonomy

Create a taxonomy to organize the data effectively by ensuring that each piece of information will be tagged and organized.

When creating a taxonomy, consider your goals and how they relate to your customers. Ensure those tags are relevant and helpful.

6. Tag key themes

Once the taxonomy is created, tag each piece of information to ensure you can easily filter data, group themes, and spot trends and patterns.

With Dovetail, automatic clustering helps quickly sort through large amounts of information to uncover themes and highlight patterns. Sentiment analysis can also help you track positive and negative themes over time.

7. Share insights

With Dovetail, it’s simple to organize data by themes to uncover patterns and share impactful insights. You can share these insights with the wider team and key stakeholders, who can use them to make customer-informed decisions across the organization.

8. Use Dovetail as a source of truth

Use your Dovetail repository as a source of truth for new and historic data to keep data and information in one streamlined and efficient place. This will help you better understand your customers and, ultimately, deliver a better experience for them.

Get started today

Go from raw data to valuable insights with a flexible research platform

Editor’s picks

Last updated: 21 December 2023

Last updated: 16 December 2023

Last updated: 6 October 2023

Last updated: 5 March 2024

Last updated: 25 November 2023

Last updated: 15 February 2024

Last updated: 11 March 2024

Last updated: 12 December 2023

Last updated: 6 March 2024

Last updated: 10 April 2023

Last updated: 20 December 2023

Latest articles

Related topics, log in or sign up.

Get started for free

  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

what is research data repository

Home Market Research Research Tools and Apps

Data Repository: What it is, Types and Guide

Data repository

Collecting data isn’t that hard, but what’s hard is creating and maintaining a data repository. Even harder is making sense out of a data repository.  

The concept of a data repository has grown in popularity in order to efficiently manage and utilize this data. A data repository is a centralized storage site for data that allows for easy access, data management, and analysis.

LEARN ABOUT: Customer data management

Here we start with a definition of a data repository, how to create a research insights repository and the benefits.

Content Index

What is a data repository?

Data repository types, benefits of using a research data repository, how to create one using online tools, how questionpro help in data repository.

A data repository is a data library or data archive. It may be referred to large database management systems or several databases that collect, manages, and store sensitive data sets for data analysis , sharing, and reporting.

Authorized users can easily access and retrieve data by using query and search tools, which helps with research and decision-making. It gives a complete and unified view of the data by combining data from different sources, like databases, apps, and external systems.

Data can be collected and stored in different ways, like aggregated data which is usually collected from multiple sources or segments of a business. Then they can be stored in a structured or unstructured manner and later on can be tagged with different metadata.

The data repository uses structured organization methods, standardized schemas, and metadata to ensure that the data is always the same and easy to find. It has tools for storing, managing, and protecting data, such as compression, indexing, access controls, encryption, and reporting.

Data repositories generally maintain subscriptions to licensed data resources for its user to access the information. 

Security is crucial as more organizations adopt data repositories to manage and store data. Data repositories are generally categorized into four types of data repositories:

Data warehouse

This is the largest repository type, where data is collected from several business segments or sources. In this repository, the data stored is generally used for analysis and reporting, which will help the data users or teams to make the correct decision in their business or project.

In this repository, data can be in any form that is structured, semi-structured, or unstructured. It is a huge storehouse of unstructured data categorized and labeled with metadata. 

The main reason of coming into existence of a data lake is the limitation of the data warehouses. It helps to gain better data governance and data governance framework total control of the data it has in it.

Data marts are often confused with data warehouses. However, they serve different functions.

This subset of the data warehouse is focused on a particular subject, department, or any other specific area. Since we have the data stored for a specific area, a user can swiftly access the insights without spending much time searching in an entire data warehouse, ultimately making users’ lives easy.

This repository contains the most complex data in it. It may be described as the multidimensional extensions of different tables, and they’re generally used to represent data that is too complex to be described by just tables, rows, and columns. 

So basically, a data cube can be used when we analyze data available to us and beyond 3-D. Here, we’ll particularly talk about data repositories used in market research.

LEARN ABOUT: Data Management vs Data Governance

Using research data repositories has many benefits for both researchers and the scientific community as a whole. Here are some significant benefits:

benefits-of-using-a-research-data-repository

Greater visibility

Having data saved in data repositories enable you to view data anytime. Keeping it siloed in Excel sheets or applications not used by a team reduces its visibility and usability. It leads to a waste of time and resources.

Enhanced discoverability

Saving data in digital format make it more accessible. Just search for the piece of data you’re looking for, and voila! Also, the metadata added along with the data repository enables others to understand the large context and make more sense of it.

A data repository contains many pieces of data. However, it’s more than just a warehouse. Discrete datasets are joined such that you can derive interesting insights into your area of research. You can generate various types of reports using the same datasets. 

For instance, if you conduct an online survey and collect data from your target audience , you can generate a comparison report to compare responses from various demographic groups. You can also generate trend reports to understand how people’s choices have changed over time. Both of these reports use the same data.

Gain insights from multiple sources of data

Integrating data repositories with other applications lets you see a multi-dimensional view of your data. For instance, you can analyze the historical survey data along with the actual sales data to understand the accuracy of insights gained in the past.

Creating data repositories for research data is simple with online tools. If you are conducting your research using surveys , communities, focus groups , or any other method, here are some of the ways to create a one.

Create a questionnaire

Many online tools allow you to drag and drop question types . You can create a survey in under 5 mins! Or you can also use a ready-to-use survey template to save time. Customize the template per your needs, and you’re ready.

Brand your survey

Customize the header and footer, and add a logo to look more professional. You can also choose a font style and color that suit your brand voice. Branding your surveys increases the chances of getting more responses.

Distribute your survey

Many tools offer different ways to distribute your survey, such as email, embedding data on the website, or sharing it on social media sites. You can also generate a QR code or let your audience answer questions using a mobile app.

Analyze the data

Finally, once you have collected your data, generating the reports is just a matter of time. Use tools that let you create dashboards and generate reports with ease.

QuestionPro is a powerful online survey and research platform that collects, analyzes, and manages data. It mostly creates surveys, collects data, and helps establish and maintain data repositories. QuestionPro helps data repository management in several ways:

  • Data collection : QuestionPro lets you develop and send surveys to collect data. Surveys can use multiple choice, rating scales, open-ended questions, and more. Your data repositories get important data from this data collection process.
  • Data Management : With QuestionPro, you can effectively organize and manage your gathered data. It filters, categorizes, and validates data to ensure accuracy and quality. These management tools help keep a data repository that is well-organized and ordered.
  • Data Analysis: QuestionPro has built-in tools to help you examine and visualize your data. You can create reports, charts, and graphs based on survey answers to help you find trends, patterns, and insights. The analysis results can be saved in your data repository.
  • Real-time Reporting: Real-time reporting lets you view and analyze your data. After collecting replies, you may instantly generate reports to assess trends and progress and make data-driven decisions.
  • Data Security: QuestionPro prioritizes data security. To prevent data breaches, it encrypts, transfers, and restricts data access. This makes sure that the data in your repository is safe and that users’ privacy is protected.
  • Data Integration: QuestionPro integrates with Excel, Google Sheets, and SPSS. This connection lets you import external data or survey responses into your data repositories for analysis and storage.

Data collecting, customer data integration , management, analysis, and security features in QuestionPro can help you manage your repository. It’s useful for data repository management since it centralizes data collection, storage, and analysis.

LEARN ABOUT: Best Data Collection Tools

If you need any help conducting research or creating a data repository, connect with our team of experts. We can guide you through the process and help you make the most of your data.

FREE TRIAL         LEARN MORE

Frequently Asking Questions (FAQ)

 Your data repository should suit your demands. You should choose a repository that is popular and relevant to your research domain. Your data format should be supported by the repository.

Data repositories are managed digital environments that specialize in gathering, characterizing, distributing, and tracking research data. Sharing data in a repository is a best practice that is frequently mandated by federal authorities.

MORE LIKE THIS

ai for customer experience

The Power of AI in Customer Experience — Tuesday CX Thoughts

Apr 16, 2024

employee lifecycle management software

Employee Lifecycle Management Software: Top of 2024

Apr 15, 2024

Sentiment analysis software

Top 15 Sentiment Analysis Software That Should Be on Your List

A/B testing software

Top 13 A/B Testing Software for Optimizing Your Website

Apr 12, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

U.S. flag

An official website of the United States government

Here's how you know

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Finding Datasets, Data Repositories, and Data Standards

This online guide contains resources for finding data repositories for data preservation and access and locating datasets for reuse. The guide was developed as an online companion for the class Resources for Finding and Sharing Research Data .  If you are NIH or HHS staff, please check out the NIH Library  training schedule  for upcoming classes.

If you need a one-on-one or group consultation on locating data repositories and datasets, please contact the NIH Library .

Some content of this guide is adapted from:

  • Read, Kevin; Surkis, Alisa (2018): Research Data Management Teaching Toolkit. figshare. ( https://figshare.com/articles/Research_Data_Management_Teaching_Toolkit/5042998 )  This work is licensed under Attribution 4.0 International (CC BY 4.0).

Navigation:

Resources to Locate Data Repositories

Resources for data sharing for intramural nih researchers, issues to consider with data repositories, searching across data repositories, generalist repositories, data journals, databases linked to datasets, issues to consider with datasets, data standards and common data elements (cdes), data repositories.

  • Domain-specific repositories
  • Generalist repositories
  • Information from the BMIC tables described above, listing  repositories for sharing scientific data  and  repositories for accessing scientific data , can also be found at Sharing.nih.gov .
  • The portal covers data registries from across many academic disciplines.
  • Users can search by keyword or  browse repositories by subject , content type , or country .
  • Choose Databases to search and browse data repositories.
  • Choose Collections to view data repositories, standards, and policies related to various topics.
  • Submit a   Data Management and Sharing plan  (DMSP) outlining how scientific data and any accompanying metadata will be managed and shared, taking into account any potential restrictions or limitations.
  • Comply with the Data Management and Sharing plan approved by the funding Institute or Center (IC).
  • Data Management & Sharing Policy Overview :  Learn more about the 2023 Data Management & Sharing Policy, and find resources to assist with compliance.
  • Allowable Costs for Data Management and Sharing
  • Elements of an NIH Data Management and Sharing Plan
  • Selecting a Repository for Data Resulting from NIH-Supported Research
  • Protecting Privacy When Sharing Human Research Participant Data
  • Responsible Management and Sharing of American Indian/Alaska Native Participant Data
  • Research associated with a ZIA
  • Research associated with a clinical protocol that will undergo IC Initial Scientific Review
  • The plans will address the elements indicated in the Intramural Research Program Data Management and Sharing (IRP DMS) Plan template. The template addresses six NIH-recommended core elements , and allows for the inclusion of IC-specific elements: Intramural Data Management and Sharing Plan Template (PDF)
  • See the 2023 NIH Data Management and Sharing Policy page  in the OIR Sourcebook for additional guidance and resources.
  • See the library guide  Data Management and Sharing Plan Resources   for a detailed list of DMSP resources and IC-specific contacts.
  • Genomic Data Sharing Policy
  • NIH Institute and Center Data Sharing Policies
  • Intramural Human Data Sharing Policy
  • Other Sharing Policies
  • Find more information on Intramural Data Sharing from the NIH Office of Intramural Research.
  • Visit Sharing.nih.gov for guidance on Selecting a Data Repository and a list of potential Repositories for Sharing Scientific Data .

Issues to consider when finding a data repository to preserve and share data:

  • Required Repositories: Check the funder/publisher policies to see if there are required repositories where the data must be deposited.
  • You may need to anonymize and/or aggregate the data before sharing, or access to the data may need to be limited to researchers with specific permissions.
  • Intellectual Property:  Be aware of who owns the intellectual property and if there are any licensing restrictions.
  • Required Data Standards: Be aware of the data standards (such as metadata and data formats) required for depositing the data in the repository.
  • Deposit and Storage Costs: Be aware of any costs associated with depositing/storing the data.

Find additional guidance at Sharing.nih.gov for Selecting a Data Repository .

  • Indexes datasets using the metadata descriptions that come directly from the dataset web pages using schema.org structure.
  • Contains more than 31 million datasets from more than 4,600 internet domains.
  • About half of these datasets come from .com domains, but .org and governmental domains also well represented.
  • Dataset results are now also listed in general Google search results, according to February 2023 blog post .
  • Filter results by date range, data type, source type (article or data repository), and source.

Here’s a closer look at a few major cross-disciplinary repositories highlighted on the NIH Data Sharing Resources: Generalist Repositories page. 

  • Browse or search and filter datasets by geographical location, subject, journal, or institution.
  • Filter by Item Type: Dataset.
  • Filter by Type: Dataset to view only dataset results.

The NIH Office of Data Science Strategy (ODSS) announced the  Generalist Repository Ecosystem Initiative (GREI) , which includes seven established generalist repositories that will work together to establish consistent metadata, develop use cases for data sharing, train and educate researchers on FAIR data and the importance of data sharing, and more. A series of recorded webinars is offered to learn about GREI and generalist repositories. 

  • Some will also store the dataset.
  • Others provide recommendations of where to store the data.
  • Usually peer-reviewed.
  • GigaScience : An open access, open data, open peer-review journal from Oxford University Press focusing on “big data” research from the life and biomedical sciences.
  • Scientific Data : Scientific Data is a peer-reviewed, open-access journal from Springer Nature that publishes descriptions of scientifically valuable datasets and research that advances the sharing and reuse of scientific data.
  • Sources of Dataset Peer Review : University of Edinburgh maintains a list of peer-reviewed data publications.
  • The EU-funded FOSTER portal (e-learning platform for training resources on Open Science) provides a list of Open Data Journals .
  • Walters, William H. 2020. “ Data Journals: Incentivizing Data Access and Documentation Within the Scholarly Communication System ”.  Insights  33 (1): 18. DOI:  http://doi.org/10.1629/uksg.510 : Provides list of data journals.
  • PubMed : Use the filter option “Article Attribute” > “Associated Data” to only view results with related data links. Data filters were originally added to PubMed and PubMed central in 2018.
  • Web of Science : When viewing search results in Web of Science (All Databases), choose the Associated Data option under Quick Filters to view only search results that mention a data set, data study, or data repository in the Data Citation Index .  The Data Citation Index includes over 14 million research data sets and 1.6 million data studies from over 450 international data repositories in the sciences, social sciences, and arts and humanities.

Issues to consider when re-using datasets include:

  • Who is the author of the dataset? What is their institutional affiliation?
  • Is there a peer-reviewed publication associated with the dataset?
  • Licensing : Check any license restrictions for the data. Many repositories will list the type of license the data is covered by (usually Creative Commons or Open Data Commons licenses ).
  • Use the format defined by a style guide, like APA (See APA style manual examples for datasets ).
  • In EndNote, you can define a reference as a dataset. EndNote will then format the reference into the correct dataset citation format for the selected style.
  • Learn more: NYU Libraries, Data Sources: How to Cite Data & Statistics 

See the ELIXIR Research Data Management Kit (RDMkit) guide on Existing Data for additional considerations and resources when locating existing datasets for reuse.

Data/metadata standards and CDEs can help to make data more FAIR (findable, accessible, interoperable, and re-usable – see FORCE11 The FAIR Data Principles ).

  • DCC Disciplinary Metadata : Collections of metadata standards organized by discipline.
  • FAIRsharing.org : An online catalog that includes over 1600 data and metadata standards.
  • NIH CDE Repository : The NIH Common Data Elements (CDE) Repository provides access to structured human and machine-readable definitions of data elements that have been recommended or required by NIH Institutes and Centers and other organizations for use in research and for other purposes.

Ohio University Logo

University Libraries

  • Ohio University Libraries
  • Library Guides

Research Data Repositories: Finding and Storing Data

  • Business & Economics
  • Crime, Government, & Law
  • Sociology, Anthropology, & Archaeology
  • Biological Sciences
  • Engineering and Computer Science
  • Health, Medicine, and Psychology
  • Physics and Astronomy
  • Chemistry and Biochemistry
  • General Humanities
  • Literature, Linguistics, & Languages
  • Contact/Need Help?

We have three guides about data: Which one do you need?

  • Research Data Literacy 101 This guide covers research data generally, what data is, the difference between data and statistics, understanding open data, library databases that offer statistics and data, and other overview topics.
  • Research Data Repositories: Finding and Storing Data This guide is a annotated list of data repositories by subject where a researcher can deposit their data per gov standards and find data sets from research done by others. Note: For library databases that offer statistics and data, see Research Data Literacy 101: Find Data and Statistics.
  • Research Data Management This guide covers how to create a Data Management Plan, including funders, metadata, and other important aspects. Includes help using DMPTool.

Often, the place to find and store data are the very same. Researchers will place the data they collect into general or disciplinary repositories. While other researchers can search those repositories for data and datasets on their topic. Some repositories are costly while others are considered "open" and offer data freely for anyone to download. 

Data and Statistics Are Not Equivalent 

Although both terms are commonly used synonymously, they are, in fact, very different. Before you start searching for either, think about which one best applies to your needs. 

  • Data: are collected raw numbers or bits of information that have not been analyzed or organized. 
  • Statistics: are the product of collected data after it has been analyzed or organized that will help derive meaning from the data. 

The National Library of Medicine has a great resource full of other data-related definitions. 

What to Consider When Choosing a Data Repository

A data repository is a storage space for researchers to deposit data sets associated with their research. And if you’re an author seeking to comply with a journal or funder data sharing policy, you’ll need to identify a suitable repository for your data.

An open access data repository openly stores data in a way that allows immediate user access to anyone. There are no limitations to the repository access.

When choosing a repository for your data, keep in mind the following:

  • It is likely that your funder or journal will have specific guidelines for sharing your data
  • Ensure the repository issues a persistent identifier (like a DOI) or you can link to your ORCID account
  • Repository has a preservation plan in perpetuity
  • Does the repository have a cost to store your data? There may also be a cost to access datasets.
  • Is the repository certified or indexed?
  • Is the repository completely open or are there restrictions to access?
  • Consider FAIR data Principles - Data should be Findable, Accessible, Interoperable, and Re-usable

NIH guidelines for selecting a data repository

3 Ways to use Google to find Data

Google has a Dataset Search! Here is a video tutorial on how to use this search tool .

You can search for specific file types in Google, for example CSV files for datasets. By typing into Google filetype:csv in the search bar you are "telling" Google to only search for things that have that specific file type. For example: (poverty AND ohio) filetype:xls will result in XLS (Excel) files mentioning Poverty in Ohio.

Limit search results by web domain by typing into Google: site:.gov (YOUR TOPIC HERE) . This will limit datasets, files, etc. from specific websites. You could even do .org for professional organizations.

  • Next: Social Sciences >>

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

  • PLOS Biology
  • PLOS Climate
  • PLOS Complex Systems
  • PLOS Computational Biology
  • PLOS Digital Health
  • PLOS Genetics
  • PLOS Global Public Health
  • PLOS Medicine
  • PLOS Mental Health
  • PLOS Neglected Tropical Diseases
  • PLOS Pathogens
  • PLOS Sustainability and Transformation
  • PLOS Collections
  • About This Blog
  • Official PLOS Blog
  • EveryONE Blog
  • Speaking of Medicine
  • PLOS Biologue
  • Absolutely Maybe
  • DNA Science
  • PLOS ECR Community
  • All Models Are Wrong
  • About PLOS Blogs

What a difference a data repository makes: Six ways depositing data maximizes the impact of your science

images of different types of data

Data is key to verification, replication, reuse, and enhanced understanding of research conclusions. When your data is in a repository—instead of an old hard drive, say, or even a Supporting Information file—its impact and its relevance are magnified. Here are six ways that putting your data in a public repository can help your research go further.

1. You can’t lose data that’s in a public data repository

Have you ever lost track of a dataset? Maybe you’ve upgraded your computer or moved to a new institution. Maybe you deleted a file by mistake, or simply can’t remember the name of the file you’re looking for. No matter the cause, lost data can be embarrassing and time consuming. You’re unable to supply requested information to journals during the submission process or to readers after publication. Future meta analyses or systematic reviews are impossible. And you may end up redoing experiments in order to move forward with your line of inquiry. With data securely deposited in a repository with a unique DOI for tracking, archival standards to prevent loss, and metadata and readme materials to make sure your data is used correctly, fulfilling journal requests or revisiting past work is easy.

2. Public data repositories support understanding, reanalysis and reuse

Transparently posting raw data to a public repository supports trustworthy, reproducible scientific research. Insight into the data and analysis gives readers a deeper understanding of published research articles. Offering the opportunity for others to interpret results demonstrates integrity and opens new avenues for discussion and collaboration. Machine-readable data formatting allows the work to be incorporated into future systematic reviews or meta analyses, expanding its usefulness.

3. Public data repositories facilitate discovery

Even the best data can’t be used unless it can be found. Detailed metadata, database indexing, and bidirectional linking to and from related articles helps to make data in public repositories easily searchable—so that it reaches the readers who need it most, maximizing the impact and influence of the study as a whole.

4. Public data repositories reflect the true value of data

Data shouldn’t be treated like an ancillary bi-product of a research article. Data is research . And researchers deserve academic credit for collecting, capturing and curating the data they generate through their work. Public repositories help to illustrate the true importance and lasting relevance of datasets by assigning them their own unique DOI, distinct from that of related research articles—so that datasets can accumulate citations in their own right.

5. Public data demonstrates rigor

There’s no better way to illustrate the rigor of your results than explaining exactly how you achieved them. Sharing data lets you demonstrate your credibility and inspires confidence in readers by contextualizing results and facilitating reproducibility.

6. Research with data in public data repositories attracts more citations

A 2020 study of more than 500,000 published research articles found articles that link to data in a public repository have a 25% higher citation rate on average than articles where data is available on request or as Supporting Information. The precise reasons for the association remain unclear. Are researchers who deposit carefully curated data in a repository also more likely to produce rigorous, citation-worthy research? Are researchers with the time and resources to devote to data curation and deposition more established in their careers, and therefore more highly cited? Are readers more likely to cite research when they trust that they can verify the conclusions with data? Perhaps some combination? 

What do you see as the most important reason for posting data in a repository?

Access to raw scientific data enhances understanding, enables replication and reanalysis, and increases trust in published research. The vitality and utility of…

The latest quarterly update to the Open Science Indicators (OSIs) dataset was released in December, marking the one year anniversary of OSIs…

For PLOS, increasing data-sharing rates—and especially increasing the amount of data shared in a repository—is a high priority. Research data is a…

How to build a research repository: a step-by-step guide to getting started

How to build a research repository: a step-by-step guide to getting started

Research repositories have the potential to be incredibly powerful assets for any research-driven organisation. But when it comes to building one, it can be difficult to know where to start. In this post, we provide some practical tips to define a clear vision and strategy for your repository.

what is research data repository

Done right, research repositories have the potential to be incredibly powerful assets for any research-driven organisation. But when it comes to building one, it can be difficult to know where to start.

As a result, we see tons of teams jumping in without clearly defining upfront what they actually hope to achieve with the repository, and ending up disappointed when it doesn't deliver the results.

Aside from being frustrating and demoralising for everyone involved, building an unused repository is a waste of money, time, and opportunity.

So how can you avoid this?

In this post, we provide some practical tips to define a clear vision and strategy for your repository in order to help you maximise your chances of success.

🚀 This post is also available as a free, interactive Miro template that you can use to work through each exercise outlined below - available for download here .

Defining the end goal for your repository

To start, you need to define your vision.

Only by setting a clear vision, can you start to map out the road towards realising it.

Your vision provides something you can hold yourself accountable to - acting as a north star. As you move forward with the development and roll out of your repository, this will help guide you through important decisions like what tool to use, and who to engage with along the way.

The reality is that building a research repository should be approached like any other product - aiming for progress, over perfection with each iteration of the solution.

Starting with a very simple question like "what do we hope to accomplish with our research repository within the first 12 months?" is a great starting point.

You need to be clear on the problems that you’re looking to solve - and the desired outcomes from building your repository - before deciding on the best approach.

Building a repository is an investment, so it’s important to consider not just what you want to achieve in the next few weeks or months, but also in the longer term to ensure your repository is scalable.

Whatever the ultimate goal (or goals), capturing the answer to this question will help you to focus on outcomes over output .

🔎 How to do this in practice…

1. complete some upfront discovery.

In a previous post we discussed how to conduct some upfront discovery to help with understanding today’s biggest challenges when it comes to accessing and leveraging research insights.

⏰ You should aim to complete your upfront discovery within a couple of hours, spending 20-30 mins interviewing each stakeholder (we recommend talking with at least 5 people, both researchers and non-researchers).

2. Prioritise the problems you want to solve

Start by spending some time reviewing the current challenges your team and organisation are facing when it comes to leveraging research and insights.

You can run a simple affinity mapping exercise to highlight the common themes from your discovery and prioritise the top 1-3 problems that you’d like to solve using your repository.

what is research data repository

💡 Example challenges might include:

Struggling to understand what research has already been conducted to-date, leading to teams repeating previous research
Looking for better ways to capture and analyse raw data e.g. user interviews
Spending lots of time packaging up research findings for wider stakeholders
Drowning in research reports and artefacts, and in need of a better way to access and leverage existing insights
Lacking engagement in research from key decision makers across the organisation

⏰ You should aim to confirm what you want to focus on solving with your repository within 45-60 mins (based on a group of up to 6 people).

3. Consider what future success looks like

Next you want to take some time to think about what success looks like one year from now, casting your mind to the future and capturing what you’d like to achieve with your repository in this time.

A helpful exercise is to imagine the headline quotes for an internal company-wide newsletter talking about the impact that your new research repository has had across the business.

The ‘ Jobs to be done ’ framework provides a helpful way to format the outputs for this activity, helping you to empathise with what the end users of your repository might expect to experience by way of outcomes.

what is research data repository

💡 Example headlines might include:

“When starting a new research project, people are clear on the research that’s already been conducted, so that we’re not repeating previous research” Research Manager
“During a study, we’re able to quickly identify and share the key insights from our user interviews to help increase confidence around what our customers are currently struggling with” Researcher
“Our designers are able to leverage key insights when designing the solution for a new user journey or product feature, helping us to derisk our most critical design decisions” Product Design Director
“Our product roadmap is driven by customer insights, and building new features based on opinion is now a thing of the past” Head of Product
“We’ve been able to use the key research findings from our research team to help us better articulate the benefits of our product and increase the number of new deals” Sales Lead
“Our research is being referenced regularly by C-level leadership at our quarterly townhall meetings, which has helped to raise the profile of our team and the research we’re conducting” Head of Research

Ask yourself what these headlines might read and add these to the front page of a newspaper image.

what is research data repository

You then want to discuss each of these headlines across the group and fold these into a concise vision statement for your research repository - something memorable and inspirational that you can work towards achieving.

💡Example vision statements:

‘Our research repository makes it easy for anyone at our company to access the key learnings from our research, so that key decisions across the organisation are driven by insight’
‘Our research repository acts as a single source of truth for all of our research findings, so that we’re able to query all of our existing insights from one central place’
‘Our research repository helps researchers to analyse and synthesise the data captured from user interviews, so that we’re able to accelerate the discovery of actionable insights’
‘Our research repository is used to drive collaborative research across researchers and teams, helping to eliminate data silos, foster innovation and advance knowledge across disciplines’
‘Our research repository empowers people to make a meaningful impact with their research by providing a platform that enables the translation of research findings into remarkable products for our customers’

⏰ You should aim to agree the vision for your repository within 45-60 mins (based on a group of up to 6 people).

Creating a plan to realise your vision

Having a vision alone isn't going to make your repository a success. You also need to establish a set of short-term objectives, which you can use to plan a series of activities to help you make progress towards this.

Focus your thinking around the more immediate future, and what you want to achieve within the first 3 months of building your repository.

Alongside the short-term objectives you’re going to work towards, it’s also important to consider how you’ll measure your progress, so that you can understand what’s working well, and what might require further attention. 

Agreeing a set of success metrics is key to holding yourself accountable to making a positive impact with each new iteration. This also helps you to demonstrate progress to others from as early on in the process as possible.

1. Establish 1-3 short term objectives

Take your vision statement and consider the first 1-3 results that you want to achieve within the first 3 months of working towards this.

These objectives need to be realistic and achievable given the 3 month timeframe, so that you’re able to build some momentum and set yourself up for success from the very start of the process.

💡Example objectives:

Improve how insights are defined and captured by the research team
Revisit our existing research to identify what data we want to add to our new research repository
Improve how our research findings are organised, considering how our repository might be utilised by researchers and wider teams
Initial group of champions bought-in and actively using our research repository
Improve the level of engagement with our research from wider teams and stakeholders

Capture your 3 month objectives underneath your vision, leaving space to consider the activities that you need to complete in order to realise each of these.

what is research data repository

2. Identify how to achieve each objective

Each activity that you commit to should be something that an individual or small group of people can comfortably achieve within the first 3 months of building your repository.

Come up with some ideas for each objective and then prioritise completing the activities that will result in the biggest impact, with the least effort first.

💡Example activities:

Agree a definition for strategic and tactical insights to help with identifying the previous data that we want to add to our new research repository
Revisit the past 6 months of research and capture the data we want to add to our repository as an initial body of knowledge
Create the first draft taxonomy for our research repository, testing this with a small group of wider stakeholders
Launch the repository with an initial body of knowledge to a group of wider repository champions
Start distributing a regular round up of key insights stored in the repository

You can add your activities to a simple kanban board , ordering your ‘To do’ column with the most impactful tasks up top, and using this to track your progress and make visible who’s working on which tasks throughout the initial build of your repository.

what is research data repository

This is something you can come back to a revisit as you move throughout the wider roll out of your repository - adding any new activities into the board and moving these through to ‘Done’ as they’re completed.

⚠️ At this stage it’s also important to call out any risks or dependencies that could derail your progress towards completing each activity, such as capacity, or requiring support from other individuals or teams.

3. Agree how you’ll measure success

Lastly, you’ll need a way to measure success as you work on the activities you’ve associated with each of your short term objectives.

We recommend choosing 1-3 metrics that you can measure and track as you move forward with everything, considering ways to capture and review the data for each of these.

⚠️ Instead of thinking of these metrics as targets, we recommend using them to measure your progress - helping you to identify any activities that aren’t going so well and might require further attention.

💡Example success metrics:

Usage metrics - Number of insights captured, Active users of the repository, Number of searches performed, Number of insights viewed and shared
User feedback - Usability feedback for your repository, User satisfaction ( CSAT ), NPS aka how likely someone is to recommend using your repository
Research impact - Number of stakeholder requests for research, Time spent responding to requests, Level of confidence, Repeatable value of research, Amount of duplicated research, Time spent onboarding new joiners
Wider impact - Mentions of your research (and repository) internally, Links to your research findings from other initiatives e.g. discovery projects, product roadmaps, Customers praising solutions that were fuelled by your research

Think about how often you want to capture and communicate this information to the rest of the team, to help motivate everyone to keep making progress.

By establishing key metrics, you can track your progress and determine whether your repository is achieving its intended goals.

⏰ You should aim to create a measurable action plan for your repository within 60-90 mins (based on a group of up to 6 people). ‍ ‍

🚀 Why not use our free, downloadable Miro template to start putting all of this into action today - available for download here .

To summarise

As with the development of any product, the cost of investing time upfront to ensure you’re building the right thing for your end users, is far lower than the cost of building the wrong thing - repositories are no different!

A well-executed research repository can be an extremely valuable asset for your organisation, but building one requires consideration and planning - and defining a clear vision and strategy upfront will help to maximise your chances of success.

It’s important to not feel pressured to nail every objective that you set in the first few weeks or months. Like any product, the further you progress, the more your strategy will evolve and shift. The most important thing is getting started with the right foundations in place, and starting to drive some real impact.

We hope this practical guide will help you to get started on building an effective research repository for your organisation. Thanks and happy researching!

what is research data repository

‍ Work with our team of experts

At Dualo we help teams to define a clear vision and strategy for their research repository as part of the ‘Discover, plan and set goals’ module facilitated by our Dualo Academy team.  If you’re interested in learning more about how we work with teams, book a short call with us to discuss how we can support you with the development of your research repository and knowledge management process.

Nick Russell

I'm one of the Co-Founders of Dualo, passionate about research, design, product, and AI. Always open to chatting with others about these topics.

Insights to your inbox

Join our growing community and be the first to see fresh content.

Repo Ops ideas worth stealing

Interviews with leaders

Dualo newsletter signup

Related Articles

Welcoming a new age of knowledge management for user research

Welcoming a new age of knowledge management for user research

Building a research repository? Avoid these common pitfalls

Building a research repository? Avoid these common pitfalls

Unlocking hidden insights: why research teams must conduct meta-analysis

Unlocking hidden insights: why research teams must conduct meta-analysis

Unlocking the exponential power of insights – an interview with Zachary Heinemann

Unlocking the exponential power of insights – an interview with Zachary Heinemann

A guide to prioritising insights for your research repository

A guide to prioritising insights for your research repository

  • The University Library
  • Research data management (RDM)

Research data repositories

Research data repositories provide the best option for storing and publishing research data in the long term.

Specific repositories may be recommended by funders or publishers, while some funders operate data centres for the research they fund.

Most repositories have embargo arrangements to control data access if required, with similar arrangements for access to sensitive data. A searchable directory of research data repositories can be found at  re3Data .

Researchers are strongly encouraged to deposit their data in the University of Sheffield repository  ORDA , or in a subject-specific repository or data centre. Staff should register details of their data in ORDA even if the data are stored or published elsewhere

Find help to choose a repository in the sections below and in the Digital Curation Centre guide on  where to keep research data .

Choosing a repository 

ORDA is a good option for most University of Sheffield anonymised research data, unless there is a subject-specific repository or data centre commonly used in your field. The table below provides guidance for choosing a suitable repository for your data.

Funder recommendations

While most funders require data to be preserved in an established data repository, some give specific recommendations:

  • AHRC ( ADS for archaeology)  
  • BBSRC (Various discipline-specific repositories)  
  • ESRC (UK Data Service)  
  • MRC (UK Data Service)  
  • NERC (NERC-funded data centres)

Publisher recommendations

Some publishers offer guidance to authors on data sharing, with some suggesting specific repositories:

PLoS  ( Recommended repositories )

Royal Society

Scientific Data  (Nature Research)

Taylor & Francis

DOI allocation

The University Library has a contract with the DataCite consortium, through the British Library, to enable us to allocate DOIs.

If you are building or running a data archive (within a department or institute, for example) and would like to add a DOI-allocating capability to your system, contact us at  [email protected] .

For any further information, contact  [email protected] .

Related information

Sharing research data

Harvard Dataverse

Harvard Dataverse is an online data repository where you can share, preserve, cite, explore, and analyze research data. It is open to all researchers, both inside and out of the Harvard community.

Harvard Dataverse provides access to a rich array of datasets to support your research. It offers advanced searching and text mining in over 2,000 dataverses, 75,000 datasets, and 350,000+ files, representing institutions, groups, and individuals at Harvard and beyond.

Explore Harvard Dataverse

The Harvard Dataverse repository runs on the open-source web application Dataverse , developed at the Institute for Quantitative Social Science . Dataverse helps make your data available to others, and allows you to replicate others' work more easily.   Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.

Why Create a Personal Dataverse?

  • Easy set up
  • Display your data on your personal website
  • Brand it uniquely as your research program
  • Makes your data more discoverable to the research community
  • Satisfies data management plans

Terms to know

  • A Dataverse repository is the software installation, which then hosts multiple virtual archives called dataverses .
  • Each dataverse contains datasets, and each dataset contains descriptive metadata and data files (including documentation and code that accompany the data).
  • As an organizing method, dataverses may also contain other dataverses.

Related Services and Tools

Research data services, qualitative research support.

How to choose a suitable data repository for your research data

Governments, funders, and institutions worldwide are increasingly introducing open data policies and mandates to encourage researchers to share their research data openly. Depositing your data in a publicly accessible research data repository that assigns a persistent identifier (PI or PID) ensures that your dataset remains available to humans and machines in the future. National institutes, funders, and journals often maintain a list of endorsed repositories for your use. You may need to set out your intention to deposit your research data in a repository as part of a data management plan (DMP). Still, choosing the best repository from such lists can often be daunting. Here, we offer some preliminary guidance on selecting the most suitable repository for your research data.

what is research data repository

Where to share your data?

You know you want to make your data openly available, but where should you host it? Some researchers opt to host their data solely on a laboratory website or as part of a publication’s supplementary. However, sharing data (or any other research outputs) in this ways hinders others from finding and reusing it. That’s where data repositories come in.

What is a data repository?

According to the Registry of Research Data Repositories (re3data.org ) — a global registry of research data repositories — a repository is an online storage infrastructure for researchers to store data, code, and other research outputs for scholarly publication. Research data means information objects generated by scholarly projects for example, through experiments, measurements, surveys, or interviews. Depositing your data in a publicly accessible, recognized repository ensures that your dataset continues to be available to both humans and machines in a usable form. 

An open access data repository openly stores data, including scientific data from research projects in a way that allows immediate user access to anyone. There are no limitations to the repository access. As such, repositories make data findable, accessible, and usable in the long term, by using sustainable file formats and providing persistent identifiers and informative descriptive data (metadata). 

Choosing a data repository

Nowadays, it is widely considered best practice to deposit your data in a publicly available repository, where it is assigned a persistent identifier (PI or PID) and can be accessed by anyone, anywhere. Where you deposit your data will depend on any applicable legal and ethical factors, who funded the work, and where you hope to publish. However, there are a few simple questions you can ask yourself to make selecting an appropriate repository easier.

Question #1: Does your data contain personal or sensitive information that cannot be anonymized?

If you answered ‘yes’ to this question, consider a controlled access repository.

There may be cases where openly sharing data is not feasible due to ethical or confidentiality considerations. Depending on what the Institutional Review Board approving your study said about data sharing, and what your participants consented to, it may still be possible to make your data accessible to authenticated users via a controlled-access repository or a generalist repository that allows you to limit access to your data.

Some of the repositories that allow you to limit access to your data include:

  • Figshare – You can generate a ‘private sharing link’ for free. You can send this link via email address, and the recipient can access the data without logging in or having a Figshare account.
  • Zenodo – Funded by CERN, OpenAIRE, and Horizon 2020, Zenodo lets users deposit restricted files and share access with others if they meet certain requirements.
  • OSF – You can make your project private or public and alternate between the two settings.

If you answered ‘no’ to this question, move on to question #2.

Question #2: Is there a discipline-specific repository for your dataset?

If you answered ‘yes’ to this question, consider a discipline-specific repository.

Research data differs significantly across disciplines. Discipline-specific repositories offer specialist domain knowledge and curation expertise for particular data types. Plus, using a discipline-specific repository can also make your data more visible to others in your research community. We recommend speaking to your institutional librarian, funder, or colleagues for guidance on choosing a repository relevant to your discipline. 

If you answered ‘no’ to this question, move on to question #3.

Question #3: Does your institutional repository accept data?  

If you answered ‘yes’ to this question, consider your institutional repository.

Many institutions offer support providing repository infrastructure to their researchers for managing and depositing data. Institutional repositories that accept datasets provide stewardship, helping to ensure that your dataset is preserved and accessible.

If you answered ‘no’ to this question, consider a generalist data repository.

General data repositories accept datasets regardless of discipline or institution. These repositories support a wide variety of file types and are particularly useful where a discipline-specific repository does not exist.

Some examples of generalist data repositories include:

  • 4TU.ResearchData
  • ANDS contributing repositories
  • Dryad Digital Repository
  • Harvard Dataverse
  • Mendeley Data
  • Open Science Framework
  • Science Data Bank
  • Code Ocean  

Common questions about data repositories

What is a digital object identifier (doi).

When a researcher uploads a document to an online data repository, a digital object identifier (DOI) will be assigned. A DOI is a globally unique and persistent string that identifies your work permanently. A data repository can assign a DOI to any document. The DOI contains metadata that provides users with relevant information about an object, such as the title, author, keywords, year of publication, and the URL where that document is stored.

How do I find a ‘FAIR aligned’ repository?

The repository finder tool, developed by DataCite allows you to search for certified repositories that support the FAIR data principles. The FAIR data principles aim to make research data more F inable, A ccessible, I nteroperable, and R eusable (FAIR). Both FAIRsharing and Re3Data provide information on an array of criteria to help you identify the repositories most suited to your needs.

Should I use a discipline specific repository?

If your funder does not have a preferred repository of choice, you may wish to use a discipline-specific repository which is frequently used in your field of research. This type of repository will make it easy for your research community to find your data. There are many repositories of this type,including, GEO or GenBank for genetic data, or the UK Data Service for Social Sciences and Humanities data.

What is versioning?

Some repositories accommodate changes to deposited datasets through versioning. Selecting a repository that features versioning gives you the flexibility to add new data, restructure, and improve your dataset. Each version of your dataset is uniquely identifiable and maintained – meaning others can find, access, reuse, and cite whichever version of the dataset they require. What about my software and code? Software and code are important research outputs. In addition to using a version control system such as GitHub, you should deposit your source code in a data repository where it will be assigned a unique identifier. Using such a repository will ensure your code is openly and permanently available.

How do I share de-identified research data?

Repositories vary widely so it’s essential you choose the repository best suited to your research whether it be a subject specific, general, funder, or institutional repository. If you would like to share de-identified data then one option is the NICHD DASH . This repository allows researchers to store and access de-identified data from NICHD funded research for the purposes of secondary research use.

Can I share research data with restricted access?

Restricted data deposit is possible. If you need to preserve study participant anonymity in clinical datasets, then there are repositories suitable for datasets requiring restricted data access. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.

Do I have to pay to deposit data to a repository?

Always check whether your repository requires a data publication fee. Not all repositories require data publication charges, and if your chosen repository does require a fee, you could still be entitled to sponsorship by a publisher or funder. Zenodo and Figshare both allow registered users to deposit data free of charge. However, Dryad charges a data publication fee.

What about my software and code?

Software and code are important research outputs. In addition to using a version control system such as GitHub, you should deposit your source code in a data repository where it will be assigned a unique identifier. Using such a repository will ensure your code is openly and permanently available.

Choosing a repository for your research data might seem difficult at first, but sharing your data openly is vital to increasing the reproducibility of research. In turn, you can expect greater visibility for your work and a wider potential impact.

Discover everything you need to know about making your research data open and FAIR.

Other blog posts

When did peer review start: the origins and evolution of peer review through time.

Peer review is not just quality control, it is the backbone of modern scientific and academic publis...

How to respond to peer reviewers comments: top tips on addressing reviewer feedback

The peer review process is a fundamental component of scholarly publishing, ensuring the quality and...

How to write a peer review report: tips and tricks for constructive reviews

Peer review is an integral part of scholarly communication and academic publishing. A key player in ...

What Is a Data Repository? [+ Examples and Tools]

Anna Fitzgerald

Published: April 19, 2022

Businesses are collecting, storing, and using more data than ever before. This data is being used to improve the customer experience, support marketing and advertising efforts, and drive decision making. But more data means more challenges.

Team creating data repository for better data analysis and reporting

In a survey on customer experience (CX) among businesses in the United States , 49.8% identified the lack of reliability and integrity of available data as the main challenge affecting data analysis capability for CX. Data security, data privacy, and too many data sources were also identified as challenges.

To help you overcome these issues and get the most out of your data, you can store it in a data repository. Let’s take a close look at this term, then walk through some examples, benefits, and tools that can help you store and manage your data .

Download Now: Introduction to Data Analytics [Free Guide]

What is a data repository?

A data repository is a data storage entity in which data has been isolated for analytical or reporting purposes. Since it provides long-term storage and access to data, it is a type of sustainable information infrastructure.

While commonly used for scientific research, a data repository can also be used to manage business data. Let’s take a look at some challenges and benefits below.

What are the challenges of a data repository?

The challenges of a data repository all revolve around management. For example, data repositories can slow down enterprise systems as they grow so it’s important you have a software or mechanism in place to scale your repository. You also need to ensure your repository is backed up and secure. That’s because a system crash or attack could compromise all your data since it’s stored in one place instead of distributed across multiple locations.

These challenges can be addressed by a solid data management strategy that addresses data quality, privacy, and other data trends .

To create your own, check out our guide Everything You Need to Know About Data Management .

What are the benefits of a data repository?

Having data from multiple sources in one place makes it easier to manage, analyze, and report on. A data repository makes it faster and easier to analyze and report data because it’s stored in one place and compartmentalized. It also improves the quality of data since it’s aggregated and preserved. Without a single repository, you’ll likely deal with duplicate data, missing data, and other issues that affect the quality of your analysis.

Now that we understand both the challenges and benefits of a data repository, let’s look at some examples.

Data Repository Examples

Data repository is a general term. There are several more specific terms or subtypes. Let’s take a look at some of these examples below.

Data Warehouse

A data warehouse is a centralized repository that stores large volumes of data from multiple sources in order to more efficiently organize, analyze, and report on it. Unlike a data mart and lake, it covers multiple subjects and is already filtered, cleaned, and defined for a specific use.

We’ll take a closer look at the difference between a data repository and warehouse below (jump link).

data repository example of data warehouses broken down into data marts and different purposes

Data Repository Software

Choosing a data repository software comes down to a few key factors, including sustainability, usability, and flexibility. Here are some questions to ask when evaluating different software:

  • Is the repository supported by a company or community?
  • What does the user interface look like?
  • Is the documentation clear and comprehensive?
  • What data formats does it support?

Answering these and other questions will help you pick the software that best meets your needs. Let’s take a look at some popular data repository software options below.

1. Ataccama

Best for: Multinational corporations and mid-sized businesses

data repository software Ataccama landing page outlines data quality fabric of platform

Don't forget to share this post!

Related articles.

Materialized View: What You Need to Know [+Best Practices]

Materialized View: What You Need to Know [+Best Practices]

What Is Data Hygiene?: Why You Need It & How to Do It Right

What Is Data Hygiene?: Why You Need It & How to Do It Right

API Management: What Is It & Why Does It Matter?

API Management: What Is It & Why Does It Matter?

5 Best Data Governance Tools

5 Best Data Governance Tools

How to Create a Data Quality Management Plan

How to Create a Data Quality Management Plan

Single Source of Truth: Benefits, Challenges, & Examples

Single Source of Truth: Benefits, Challenges, & Examples

Data Governance (DG): A Straightforward Guide

Data Governance (DG): A Straightforward Guide

What Is Event-Driven Architecture? Everything You Need to Know

What Is Event-Driven Architecture? Everything You Need to Know

Data Stream: Use Cases, Benefits, & Examples

Data Stream: Use Cases, Benefits, & Examples

ETL vs. ELT: What's the Difference & Which Is Better?

ETL vs. ELT: What's the Difference & Which Is Better?

Unlock the power of data and transform your business with HubSpot's comprehensive guide to data analytics.

Marketing software that helps you drive revenue, save time and resources, and measure and optimize your investments — all on one easy-to-use platform

Skip navigation

  • Log in to UX Certification

Nielsen Norman Group logo

World Leaders in Research-Based User Experience

Research repositories for tracking ux research and growing your researchops.

Portrait of Kara Pernice

October 18, 2020 2020-10-18

  • Email article
  • Share on LinkedIn
  • Share on Twitter

Every UX team needs to organize its user research in a research repository. I first worked on a research repository in the early 1990s. The lessons I learned then still hold true today, as the UX community gets serious about managing and growing user- research programs. These efforts now fall under the umbrella term “ Research Ops ” (with “Ops” being short for “operations” ) .

In This Article:

What is a research repository, relevant elements in a research repository, convenience and findability features in a repository.

A research repository is a shared collection of UX-research-related elements that should support the following functions at the organization level:

  • grow UX awareness and participation in UX work among leadership, product owners, and the organization at large
  • support UX research work, so UX professionals may be more productive as they plan and track research

Stick figures of people

There are two main types of content in a research repository:

  • The input to doing UX research: information for planning and conducting research
  • The output from doing UX research: study findings and reports

Before making a repository, analyze the UX-related processes and tools used (currently or in the near future) in your organization. Consider creating a mind map of how research gets done, or even a journey map or service blueprint of how research is initiated and results are used on development teams.

wireframe with 3 columns, left menu of findings and reports; middle checkbox filters with topic, status, date; and right with a list of findings

Some important components that can be housed in a research repository include:

Infrastructure

  • Research team’s mission and vision communicate what the team is about, how it works, and how it hopes to work in the future. This information can help others to understand the team’s capabilities, what they can expect, and what they can request. An example mission is: The UX-research team provides user and customer research and guidance for all products, services, and systems at the organization in order to maximize usefulness, usability, efficiency, enjoyment, and support for the organization’s vision.
  • Descriptions of research methods help the team learn or be reminded of a process and the reasons for different research types. Method descriptions and best practices can promote consistent high-quality work and even teach a less experienced researcher.
  • Tools and templates for conducting and analyzing research , such as templates for test plans, protocols, reports, interview scripts, user tasks, consent forms, notetaking and tips for using remote-research or analysis tools could also be housed her

Research Planning

  • Strategic research plans for the organization and for individual projects — like you might see in a research roadmap — can keep researchers and the rest of the team focused on the most important areas to research as opposed to every single product feature. When stored in a research repository these are easy to find and access
  • Schedules make research accessible to everyone, by sharing the date, time, location, research method, and what’s being studied. Armed with this information, anyone can join or ask to join in on studies, or at least look for findings upon completion
  • Detailed research plans communicate that research will be happening and how. When stored in a repository they serve as a vision document to align stakeholders and the rest of the team.
  • Research requests enable product teams to request user research to be done. Depending on the research team’s size, mission, planning, and culture, research requests may not be available at all organizations. Research requests can give insight in the research needs at your organization and can drive UX-team growth.

Data and Insights

  • Research reports tell what happened in the research study. They include overarching themes, detailed findings, and sometimes recommendations.
  • Research insights are the detailed findings or chunks of information acquired from each research study. While findings also appear in reports, saving them as their own entities makes it easier to digest them, mark their severity , track their status, and link to specific design and development assignments in the backlog or project database. In other words, each insight is digestible and easy to see, and thus more likely to get addressed.
  • Recordings and transcriptions stored in the repository or, alternatively, linked from the repository, They make user data easily accessible. Summarizing and transcribing each video allows teams to search for exactly what they're looking for. (Fun historical note: In the early 1990s, when usability-testing recordings were too large to store online, my team at Lotus created a video library. Developers could check out the physical videotapes as one would a book at a library. People were so dedicated that they borrowed them to watch the tests they had missed, and sometimes we had to make extra copies of tapes to meet the demand.)
  • Raw notes and artifacts from research sessions are often trashed after they have been analyzed. But some teams keep the notes in case they might be useful for future analysis — for example, if a team was in a rush and focused on one area of the design at the time of the study,  later it may be able to revisit the notes to glean insights related to other aspects of the design. Those notes could help inform journey maps, personas, or other user-focused artifacts.

What Is NOT Always in a Research Repository

  • UX-data analysis is usually done with specialized tools. The result of the analysis could be a text file (for example, for quantitative data analysis done in software such as R) or could be hosted online in a tool-specific format. If the latter, then the repository can link to the result of the analysis. For example, researchers may have conducted thematic analysis using Dovetail; the full research report can include a link to that board so team members can see the reasoning behind the findings.
  • A participant repository or panel is usually not stored within a research repository, even though recruiting research participants is a core function of user research. That’s because the goals and audience for the two repositories tend to be quite different. But it can be helpful for them to link to one another.

There are many other components that research teams need to track internally but that are less likely to be part of a research repository, even though they may be linked from it: user stories in a backlog, participant recruiting tools, and budget tracking for research projects.

People should be able to easily find and discover information about research. Findable and accessible information makes it possible for the team to easily be part of a research project and feel ownership about the findings. Here are some repository attributes that make it easy to use:

  • Supporting tags and metadata, to help people find items by the most granular topics
  • Searchable by keyword (e.g. for research on a certain product feature), project, team, finding, severity, status, and more
  • Hosted in a tool that people can easily access , use, learn and that matches the organization’s culture and mental model
  • Portable, so that repository elements can be easily exported to other applications or formats

Research repositories store and organize information about UX research. They collect not only methodology-related documents, but also research results at various levels of granularity (from individual findings to reports). Their purpose is to streamline the work of the research team and also to make research widely available and easy to consume throughout the organization.

For more information about the growing ResearchOps community, see https://researchops.community/ .

Related Courses

Becoming a ux strategist.

Envision, plan, and successfully lead a user-centered culture

ResearchOps: Scaling User Research

Orchestrate and optimize research to amplify its impact

UX Leader: Essential Skills for Any UX Practitioner

Apply practical leadership skills to your UX role, regardless of your title

Related Topics

  • Managing UX Teams Managing UX Teams

Learn More:

what is research data repository

UX Researchers Reporting Structure

Kara Pernice · 3 min

what is research data repository

Strategic & Reactionary User Research

Kara Pernice · 4 min

what is research data repository

The Number One Reason for Not Doing UX Research

Kara Pernice · 2 min

Related Articles:

UX Roadmaps: Definition and Components

Sarah Gibbons · 8 min

ResearchOps 101

Kate Kaplan · 8 min

What a UX Career Looks Like Today

Rachel Krause and Maria Rosala · 5 min

Relationship Mapping: Strategically Focus on Key People

Evan Sunwall · 9 min

DesignOps 101

Kate Kaplan · 6 min

Planning Effective UX Workshop Agendas

Kate Kaplan · 5 min

News: Teamscope joins StudyPages 🎉

Data collection in the fight against COVID-19

Data Sharing

6 repositories to share your research data.

Diego Menchaca's profile picture

Dear Diary, I have been struggling with an eating disorder for the past few years. I am afraid to eat and afraid I will gain weight. The fear is unjustified as I was never overweight. I have weighed the same since I was 12 years old, and I am currently nearing my 25th birthday. Yet, when I see my reflection, I see somebody who is much larger than reality. ‍ I told my therapist that I thought I was fat. She said it was 'body dysmorphia'. She explained this as a mental health condition where a person is apprehensive about their appearance and suggested I visit a nutritionist. She also told me that this condition was associated with other anxiety disorders and eating disorders. I did not understand what she was saying as I was in denial; I had a problem, to begin with. I wanted a solution without having to address my issues. Upon visiting my nutritionist, he conducted an in-body scan and told me my body weight was dangerously low. I disagreed with him. ‍ I felt he was speaking about a different person than the person I saw in the mirror. I felt like the elephant in the room- both literally and figuratively. He then made the simple but revolutionary suggestion to keep a food diary to track what I was eating. This was a clever way for my nutritionist and me to be on the same page. By recording all my meals, drinks, and snacks, I was able to see what I was eating versus what I was supposed to be eating. Keeping a meal diary was a powerful and non-invasive way for my nutritionist to walk in my shoes for a specific time and understand my eating (and thinking) habits. No other methodology would have allowed my nutritionist to capture so much contextual and behavioural information on my eating patterns other than a daily detailed food diary. However, by using a paper and pen, I often forgot (or intentionally did not enter my food entries) as I felt guilty reading what I had eaten or that I had eaten at all. I also did not have the visual flexibility to express myself through using photos, videos, voice recordings, and screen recordings. The usage of multiple media sources would have allowed my nutritionist to observe my behaviour in real-time and gain a holistic view of my physical and emotional needs. I confessed to my therapist my deliberate dishonesty in completing the physical food diary and why I had been reluctant to participate in the exercise. My therapist then suggested to my nutritionist and me to transition to a mobile diary study. Whilst I used a physical diary (paper and pen), a mobile diary study app would have helped my nutritionist and me reach a common ground (and to be on the same page) sooner rather than later. As a millennial, I wanted to feel like journaling was as easy as Tweeting or posting a picture on Instagram. But at the same time, I wanted to know that the information I  provided in a digital diary would be as safe and private as it would have been as my handwritten diary locked in my bedroom cabinet. Further, a digital food diary study platform with push notifications would have served as a constant reminder to log in my food entries as I constantly check my phone. It would have also made the task of writing a food diary less momentous by transforming my journaling into micro-journaling by allowing me to enter one bite at a time rather than the whole day's worth of meals at once. Mainly, the digital food diary could help collect the evidence that I was not the elephant in the room, but rather that the elephant in the room was my denied eating disorder. Sincerely, The elephant in the room

Why share research data?

Sharing information stimulates science. When researchers choose to make their data publicly available, they are allowing their work to contribute far beyond their original findings.

The benefits of data sharing are immense. When researchers make their data public, they increase transparency and trust in their work, they enable others to reproduce and validate their findings, and ultimately, contribute to the pace of scientific discovery by allowing others to reuse and build on top of their data.

"If I have seen further it is by standing on the shoulders of Giants." Isaac Newton, 1675.

While the benefits of data sharing and open science are categorical, sadly 86% of medical research data is never reused . In a 2014 survey conducted by Wiley with over 2000 researchers across different fields, found that 21% of surveyed researchers did not know where to share their data and 16% how to do so.

In a series of articles on Data Sharing we seek to break down this process for you and cover everything you need to know on how to share your research outputs.

In this first article, we will introduce essential concepts of public data and share six powerful platforms to upload and share datasets.

What is a Research Data Repository?

The best way to publish and share research data is with a research data repository. A repository is an online database that allows research data to be preserved across time and helps others find it.

Apart from archiving research data, a repository will assign a DOI to each uploaded object and provide a web page that tells what it is, how to cite it and how many times other researchers have cited or downloaded that object.

What is a DOI?

When a researcher uploads a document to an online data repository, a digital object identifier (DOI) will be assigned. A DOI is a globally unique and persistent string (e.g. 10.6084/m9.figshare.7509368.v1) that identifies your work permanently. 

A data repository can assign a DOI to any document, such as spreadsheets, images or presentation, and at different levels of hierarchy, like collection images or a specific chapter in a book.

The DOI contains metadata that provides users with relevant information about an object, such as the title, author, keywords, year of publication and the URL where that document is stored. 

The International DOI Foundation (IDF) developed and introduced the DOI in 2000. Registration Agencies, a federation of independent organizations, register DOIs and provide the necessary infrastructure that allows researchers to declare and maintain metadata.

Key benefits of the DOI system:

  • A more straightforward way to track research outputs
  • Gives certainty to scientific work
  • DOI's versioning system tracks changes to work overtime
  • Can be assigned to any document
  • Enables proper indexation and citation of research outputs

Once a document has a DOI, others can easily cite it. A handy tool to convert DOI's into a citation is DOI Citation Formatter . 

Six repositories to share research data

Now that we have covered the role of a DOI and a data repository, below is a list of 6 data repositories for publishing and sharing research data.

1. figshare

what is research data repository

Figshare is an open access data repository where researchers can preserve their research outputs, such as datasets, images, and videos and make them discoverable. 

Figshare allows researchers to upload any file format and assigns a digital object identifier (DOI) for citations. 

Mark Hahnel launched Figshare in January 2011. Hahnel first developed the platform as a personal tool for organizing and publishing the outputs of his PhD in stem cell biology. More than 50 institutions now use this solution. 

Figshare releases' The State of Open Data' every year to assess the changing academic landscape around open research.

Free accounts on Figshare can upload files of up to 5gb and get 20gb of free storage. 

2. Mendeley Data

what is research data repository

Mendeley Data is an open research data repository, where researchers can store and share their data. Datasets can be shared privately between individuals, as well as publicly with the world. 

Mendeley's mission is to facilitate data sharing. In their own words, "when research data is made publicly available, science benefits:

- the findings can be verified and reproduced- the data can be reused in new ways

- discovery of relevant research is facilitated

- funders get more value from their funding investment."

Datasets uploaded to Mendeley Data go into a moderation process where they are reviewed. This ensures the content constitutes research data, is scientific, and does not contain a previously published research article. 

Researchers can upload and store their work free of cost on Mendeley Data.

If appropriately used in the 21st century, data could save us from lots of failed interventions and enable us to provide evidence-based solutions towards tackling malaria globally. This is also part of what makes the ALMA scorecard generated by the African Leaders Malaria Alliance an essential tool for tracking malaria intervention globally. ‍ If we are able to know the financial resources deployed to fight malaria in an endemic country and equate it to the coverage and impact, it would be easier to strengthen accountability for malaria control and also track progress in malaria elimination across the continent of Africa and beyond.

Odinaka Kingsley Obeta

West African Lead, ALMA Youth Advisory Council/Zero Malaria Champion

There is a smarter way to do research.

Build fully customizable data capture forms, collect data wherever you are and analyze it with a few clicks — without any training required.

3. Dryad Digital Repository

what is research data repository

Dryad is a curated general-purpose repository that makes data discoverable, freely reusable, and citable.

Most types of files can be submitted (e.g., text, spreadsheets, video, photographs, software code) including compressed archives of multiple files.

Since a guiding principle of Dryad is to make its contents freely available for research and educational use, there are no access costs for individual users or institutions. Instead, Dryad supports its operation by charging a $120US fee each time data is published.

4. Harvard Dataverse

what is research data repository

Harvard Dataverse is an online data repository where scientists can preserve, share, cite and explore research data.

The Harvard Dataverse repository is powered by the open-source web application Dataverse, developed by Insitute of Quantitative Social Science at Harvard.

Researchers, journals and institutions may choose to install the Dataverse web application on their own server or use Harvard's installation. Harvard Dataverse is open to all scientific data from all disciplines.

Harvard Dataverse is free and has a limit of 2.5 GB per file and 10 GB per dataset.

5. Open Science Framework

what is research data repository

 OSF is a free, open-source research management and collaboration tool designed to help researchers document their project's lifecycle and archive materials. It is built and maintained by the nonprofit Center for Open Science.

Each user, project, component, and file is given a unique, persistent uniform resource locator (URL) to enable sharing and promote attribution. Projects can also be assigned digital object identifiers (DOIs) if they are made publicly available. 

OSF is a free service.

what is research data repository

Zenodo is a general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN. 

Zenodo was first born as the OpenAire orphan records repository, with the mission to provide open science compliance to researchers without an institutional repository, irrespective of their subject area, funder or nation. 

Zenodo encourages users to early on in their research lifecycle to upload their research outputs by allowing them to be private. Once an associated paper is published, datasets are automatically made open.

Zenodo has no restriction on the file type that researchers may upload and accepts dataset of up to 50 GB.

Research data can save lives, help develop solutions and maximise our knowledge. Promoting collaboration and cooperation among a global research community is the first step to reduce the burden of wasted research.

Although the waste of research data is an alarming issue with billions of euros lost every year, the future is optimistic. The pressure to reduce the burden of wasted research is pushing journals, funders and academic institutions to make data sharing a strict requirement.  

We hope with this series of articles on data sharing that we can light up the path for many researchers who are weighing the benefits of making their data open to the world.

The six research data repositories shared in this article are a practical way for researchers to preserve datasets across time and maximize the value of their work.

Cover image by Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IG .

References:

“Harvard Dataverse,” Harvard Dataverse, https://library.harvard.edu/services-tools/harvard-dataverse

“Recommended Data Repositories.” Nature, https://go.nature.com/2zdLYTz

“DOI Marketing Brochure,” International DOI Foundation, http://bit.ly/2KU4HsK

“Managing and sharing data: best practice for researchers.” UK Data Archive, http://bit.ly/2KJHE53

Wikipedia contributors, “Figshare,” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Figshare&oldid=896290279 (accessed August 20, 2019).

Walport, M., & Brest, P. (2011). Sharing research data to improve public health. The Lancet, 377(9765), 537–539. https://doi.org/10.1016/s0140-6736(10)62234-9

Foster, E. D., & Deardorff, A. (2017). Open Science Framework (OSF). Journal of the Medical Library Association : JMLA , 105 (2), 203–206. doi:10.5195/jmla.2017.88

Wikipedia contributors, "Zenodo," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Zenodo&oldid=907771739 (accessed August 20, 2019).

Wikipedia contributors, "Dryad (repository)," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Dryad_(repository)&oldid=879494242 (accessed August 20, 2019).

“How and Why Researchers Share Data (and Why They don't),” The Wiley Network, Liz Ferguson , http://bit.ly/31TzVHs

“Frequently Asked Questions,” Mendeley Data, https://data.mendeley.com/faq

Dear Digital Diary, ‍ I realized that there is an unquestionable comfort in being misunderstood. For to be understood, one must peel off all the emotional layers and be exposed. This requires both vulnerability and strength. I guess by using a physical diary (a paper and a pen), I never felt like what I was saying was analyzed or judged. But I also never thought I was understood. ‍ Paper does not talk back.Using a daily digital diary has required emotional strength. It has required the need to trust and the need to provide information to be helped and understood. Using a daily diary has needed less time and effort than a physical diary as I am prompted to interact through mobile notifications. I also no longer relay information from memory, but rather the medical or personal insights I enter are real-time behaviours and experiences. ‍ The interaction is more organic. I also must confess this technology has allowed me to see patterns in my behaviour that I would have otherwise never noticed. I trust that the data I enter is safe as it is password protected. I also trust that I am safe because my doctor and nutritionist can view my records in real-time. ‍ Also, with the data entered being more objective and diverse through pictures and voice recordings, my treatment plan has been better suited to my needs. Sincerely, No more elephants in this room

Diego Menchaca

Diego is the founder and CEO of Teamscope. He started Teamscope from a scribble on a table. It instantly became his passion project and a vehicle into the unknown. Diego is originally from Chile and lives in Nijmegen, the Netherlands.

More articles on

How to successfully share research data.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts

Data Repository Guidance

Scientific Data  mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an appropriate public data repository. Data should be submitted to discipline-specific, community-recognized repositories where possible. Where a suitable discipline-specific resource does not exist, data should be submitted to a  generalist repository .

Authors must deposit their data to a data repository as part of the manuscript submission process; manuscripts will not otherwise be sent for review. If data have not been deposited to a repository prior to manuscript submission we offer a service to deposit them at figshare or dryad during the submission process via our article submission platform. Data may also be deposited to these resources temporarily, if the main host repository does not support confidential peer review (see below).

Repositories need to meet our requirements for anonymous peer-review, data access, preservation, resource stability, licences and suitability for use by all researchers with the appropriate types of data:

  • Use open licences (CC0 and CC-BY, or their equivalents, are required in most cases learn more ). Exceptions will only be permitted for human derived data that is considered sensitive (e.g. risk of participant identification, controls on specific uses, etc), where we suggest data are shared under Data Usage Agreements (DUAs). We do not typically support the use of more restrictive CC licences - containing SA, NC or ND clauses - for either sensitive or non-sensitive datasets, other than where applied to third party data that has been re-used and the original licence needs to be retained. 
  • Allow public access to data without barriers, such as formal application processes, unless required for sensitive human datasets requiring controlled access and Data Usage Agreements. Note that basic login functionalities, where data are captured for analytics purposes only, are accepted for non-sensitive datasets as long as immediate access is granted to the holder of the email address without manual checks, however we encourage login-free https access without registration in most cases. 
  • All data need to be available for peer review. Where logins or other barriers are required or temporarily applied, routes for confidential peer review of submitted datasets need to be provided that do not reveal the identity of the reviewer to the data owner/author of the associated article. Please consult with the repository to arrange this, or provide the data in a temporary location for peer review. 
  • Ensure long-term persistence and preservation of datasets in their published form. All Data Descriptors need to be associated with live data, so long term preservation and persistence is required to avoid future correction or other action to ensure the integrity of the paper. 
  • Provide stable persistent identifiers for submitted datasets. DOIs are the default for most non-omics datasets described in the journal. 
  • Subject specific repositories that are supported and recognized within their scientific community are strongly encouraged - general repositories should be used where no suitable subject repository is available, or the repository does not meet the requirements above. 

The list below is intended as a guide for those who are unsure where to deposit their data, and provides examples of repositories from a number of disciplines. Please note this list does not constitute a formal or exclusive list of repositories accepted by the journal and there are many more repositories that meet our criteria than we are able to track. The list is no longer updated (since 2021), but is retained as a useful list of suggestions. 

Authors may also wish to use external resources such as DataCite’s Repository Finder and the FAIRsharing registry to find an appropriate repository for their data. Please note that certain data types (e.g. most omics and cystallographic data) are subject to mandates on which repository should be used. Please see our policy on  mandated data types  for further informaton.

View data repositories

  • Biological sciences: Nucleic acid sequence ; Protein sequence ; Molecular & supramolecular structure ; Neuroscience ; Omics ; Taxonomy & species diversity ; Mathematical & modelling resources ; Cytometry and Immunology ; Imaging ; Organism-focused resources
  • Health sciences
  • Chemistry and Chemical biology
  • Earth, Environmental and Space sciences: Broad scope Earth & environmental sciences; Astronomy & planetary sciences; Biogeochemistry and Geochemistry; Climate sciences; Ecology; Geomagnetism & Palaeomagnetism; Ocean sciences; Solid Earth sciences
  • Materials science
  • Social sciences
  • Generalist repositories

Biological sciences  ⤴

Nucleic acid sequence  ⤴.

Novel DNA sequence, novel RNA sequence, and novel genome assembly data must be deposited to repositories that are part of the International Nucleotide Sequence Collaboration (INSDC) or to those which are working towards INSDC inclusion (as listed below), unless there are privacy or ethics restrictions that prevent open sharing of such data. These data may in addition be deposited to regional and national repositories as required. For human data that requires special controls, please see our recommended health sciences repositories.

Protein sequence  ⤴

Molecular & supramolecular structure  ⤴.

These repositories accept structural data for small molecules; peptides and proteins (all); and larger assemblies (EMDB).

Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR's CheckCIF routine , and a copy of the output must be included at submission, together with a justification for any alerts reported.

Neuroscience  ⤴

These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified.

Functional genomics

Functional genomics is a broad experimental category, and Scientific Data 's recommendations in this discipline likewise bridge disparate research disciplines. Data should be deposited following the relevant community requirements where possible.

Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations .

For data linking genotyping and phenotyping information in human subjects, we strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.

Metabolomics & Proteomics

We ask authors to submit proteomics data to members of the ProteomeXchange consortium (listed below), following the MIAPE recommendations .

Taxonomy & species diversity  ⤴

Mathematical & modelling resources  ⤴, cytometry and immunology  ⤴, organism-focused resources  ⤴.

These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, we ask that data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).

Health sciences  ⤴

Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. We suggest contacting repositories directly to determine those with data access controls best suited to the specific requirements of your study.

Chemistry and Chemical biology  ⤴

Earth, Environmental and Space sciences  ⤴

Broad scope Earth & environmental sciences  ⤴

Astronomy & planetary sciences  ⤴, biogeochemistry and geochemistry  ⤴, climate sciences  ⤴, geomagnetism & palaeomagnetism  ⤴, ocean sciences  ⤴, solid earth sciences  ⤴, materials science  ⤴, social sciences  ⤴, generalist repositories  ⤴.

Scientific Data  encourages authors to archive data to one of the above data-type specific repositories where possible. Where a data-type specific repository is not available, the following generalist repositories might be suitable. Generalist repositories may also be appropriate for archiving associated analyses, or experimental-control data, supplementing the primary data in a discipline-specific repository.

The generalist repositories listed below are able to accept data from all researchers, regardless of location or funding source. If your institution has its own generalist data repository this can be used to host your data as long as the repository is able to mint DataCite DOIs , and allows data to be shared under open terms of use (for example the CC0 waiver ). Please note that if your chosen repository is unable to support confidential peer-review, you will be asked to temporarily deposit a copy of the dataset to one of our  integrated generalist repositories  to facilitate review of your article. Upon completion of peer review, the temporary copy will be erased. To use a repository which does not appear in the manuscript submission system, select 'DataCite DOI' as the repository name during the submission process.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

what is research data repository

Banner

Research Data Management (RDM): Data Repositories

  • Definition of Research Data
  • What is RDM?
  • Benefits of RDM
  • University Policy on RDM
  • Funder Requirements
  • Searching for Datasets
  • Data Management Plans
  • Data Storage
  • Data Analysis Tools/Software
  • Data Publishing
  • Long-term Data Preservation
  • Citing Data
  • RDM Life Cycle
  • RDM Training
  • Guidance on ETDs, Research Data and Self-submissions
  • Guides and Manuals
  • Templates and Tools
  • RDM Policy and Legislation
  • Data Repositories
  • Metadata Management

What is research data?

Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results http://www.ed.ac.uk/is/data-management

What is research data management?

Research data management is the process of controlling the information generated during a research project. Any research will require some level of data management, and funding agencies are increasingly requiring scholars to plan and execute good data management practices. Managing data is an integral part of the research process. It can be challenging particularly when studies involve several researchers and/or when studies are conducted from multiple locations. How data is managed depends on the types of data involved, how data is collected and stored, and how it is used – throughout the research lifecycle.  https://tinyurl.com/v6yferx

  • Love Data Publishing at UP

UP Research Data Repository (Figshare)

what is research data repository

The University's research data repository runs on a Figshare instance and can be searched at https://researchdata.up.ac.za/

Figshare is an international cloud-based Research Data Repository in support of the dissemination phase of the Research Data Management life cycle

This data repository facilitates data publishing, sharing and collaboration of academic research, allowing UP to manage and in some cases showcase its data to the wider research community

Procedure to upload datasets onto the UP research data repository is as follows:

  • For students: Login into the UP Portal . Click on Add/Remove Portlet and then add the Research Data Repository portlet. This portlet will now be permanently on your Student Portal interface. On the Research Data Repository, portlet click on Research Data Repository Platform. A Safire authentication page will now open. Click Yes, continue. Figshare will now open. You can now upload the data file(s) and complete the metadata fields.  When done, tick the Publish box and click on Save Changes.  The library will then review the metadata and do the final approval.
  • For other researchers and academic staff:  login into the UP Portal . Go to the Research Data Repository  portlet .  On the Research Data Repository,  portlet  click on Research Data Repository Platform. A  Safire  authentication page will now open. Click Yes, continue.  Figshare  will now open. You can now upload the data file(s) and complete the metadata fields . When done, tick the Publish box and click on Save Changes. The library will then review the metadata and do the final approval.

For further assistance or queries please contact Rosina Ramokgola at e-mail: [email protected] or rdm@ up.ac.za .

  • Open Data in the Humanities and Social Sciences

UP Space Institutional Repository

what is research data repository

This repository is used to upload and preserve theses & dissertations and research articles authored by UP authors  

  https://repository.up.ac.za/

Where can I search for datasets and other data repositories?

  • Search for datasets and other data repositories on re3data.org , a registry of Research Data Repositories
  • Search for datasets from other South African Universities at  https://southafrica.figshare.com/

A step-by step guide on how to upload data onto the UP Data Repository

  • The UP Research Data Repository can be searched at https://researchdata.up.ac.za/
  • How to upload your data onto the UP Data Repository This document gives a step-by-step guide on how to upload your research data onto the UP Data Repository (Figshare)

Figshare Metadata for research data

T he DLS cares about m etadata for research data and other outputs  and maintains metadata standards for research data to be discovered. This complies with FAIR principles for research data to be Findable, Accessible, Interoperable, and Re-usable to benefit others.  Help others discover, u nderstand, and use your data by  describing and documenting  it. 

  • Figshare Metadata Elements and Guidance
  • << Previous: RDM Policy and Legislation
  • Next: Links >>
  • Last Updated: Mar 4, 2024 1:31 PM
  • URL: https://library.up.ac.za/c.php?g=356288
  • Directories
  • Research Data Management Workshop from UW Libraries
  • Data Management Plans
  • Organization & Format
  • Data Storage Comparison
  • Selecting a Data Repository
  • Resources for Publishing & Sharing Research Data
  • Dryad Data Repository for UW Researchers
  • NIH Data Management and Sharing Plan
  • Frequently Asked Questions
  • Washington State Data
  • Undergraduate Guide to Using Data
  • Digital Tools
  • Reproducibility
  • Scholarly Publishing and Open Access
  • Start Your Research
  • Research Guides
  • University of Washington Libraries
  • Library Guides
  • UW Libraries
  • Research Data Management

Research Data Management: Dryad Data Repository for UW Researchers

Introduction to dryad, what is dryad .

Dryad  is a nonprofit data repository offering researchers a secure location for research data storage. UW is now a Dryad member and UW researchers can deposit their data in Dryad at no additional cost to them -- deposit costs are covered by the Libraries' membership fee. This is especially important for researchers with data sets that do not fit in any of the federal or subject repositories. 

Benefits of Submitting Your Data to Dryad

  • Complies with funders' data access and sharing mandates (for example, NIH's new DMSP )
  • Partners with major journal publishers, making manuscript submission easy
  • Provides metrics to track how individual data is viewed, shared, cited, and downloaded
  • Provides option to upload code, scripts, and software that can be automatically sent to  Zenodo
  • Curates data submitted for data and metadata integrity
  • Preserves your data in a  Core Trust Seal-Certified  repository

Uploading Your Data to Dryad

Dryad makes it easy for researchers to upload and share their data. 

1. Login with your ORCID ID

  • The first time you use Dryad you need to login with your ORCID ID .  Don't have one? You can create one straight from the Dryad login page (see this guide for more information).
  • Because UW is a Dryad member, you will authenticate with ORCID just once to verify your identify.
  • After the first login you will be prompted for your UW credentials. Once logged in, you will see the UW logo and can start submitting your data.

2. Describe your Data

  • You will be asked to enter metadata. Dryad provides tips for complete and comprehensive metadata . 
  • Refer to Dryad's submission guidelines for assistance.

3. Data Types Dryad accepts many different types of data:

  • If submitting human subjects data, makes sure your data is anonymized and follows both legal and ethical guidelines. 
  • All data deposited with Dryad must be complete, and open to the public. 
  • All data must be compatible with the  Creative Commons Zero license . Find out more about Creative Commons licenses  here .

4. Data Curation

Dryad curates (reviews) and then publishes your data. Select the  Private for Peer Review  check box if you want to suspend data publication until after the peer review process is complete. In all cases, the Dryad folks ensure your data is curated. If they have questions, they will be in touch.

5. Data Publication

Dryad notifies you once your data is published and provides you with a permanent DOI (digital object identifier) you can use to cite your data. You can update and re-version your data set at any time. 

More questions? See  Dryad's Frequently Asked Questions

Thanks to NMSU for letting us borrow from their helpful Dryad  libguide .

More information: 

  • Dryad submission process
  • Dryad Terms of Service
  • Data Publishing Ethics , from Dryad

Research Data Services Librarian

Profile Photo

  • << Previous: Resources for Publishing & Sharing Research Data
  • Next: NIH Data Management and Sharing Plan >>
  • Last Updated: Apr 16, 2024 3:01 PM
  • URL: https://guides.lib.uw.edu/research/dmg

Enabling HIPAA-Compliant Clinical Research at Stanford

Enabling Data Driven Clinical Research

Phi download.

STARR Tools allow you to download data files for up to 7500 patients to your desktop. Once downloaded, however, these files pose a significant data privacy risk, since it can be easy to forget that they are confidential and must be handled at all times in accordance with the language in the IRB used to generate them. Accordingly, PHI is scrubbed using best-effort techniques prior to download.

If you have a legitimate research need for PHI in downloaded data files, you must

  • Have permission to work with PHI online , and
  • Obtain an exemption to the no PHI in downloads policy

You can request an exemption from the no PHI in downloads policy by filling out this survey . Please note that in addition to completing a Data Privacy Attestation (either primary or add-on ), you must be named on the IRB you are requesting an exemption for, and you will be required to meet to discuss your project needs as part of the exemption consideration process.

If your exemption is granted, you will be able to download PHI from STARR Tools. As you do so, we ask that you please continue to adhere to the HIPAA "Minimum Necessary" principle and only download the PHI strictly needed to fulfil your research mission, and delete the data files as soon as the use for the PHI has been accomplished.

The full list of requirements for identified data download is:

  • You must be named on the IRB protocol
  • The IRB must have an associated approved primary Data Privacy Attestation
  • If you are not the signator of the primary DPA, you must have completed an add-on attestation
  • You must have an approved exemption to our policy of scrubbing PHI from data download files

IMAGES

  1. Research Data Repository Registration Workflow of re3data.org

    what is research data repository

  2. Selecting a Data Repository for your Research Data

    what is research data repository

  3. Data Repository: Types, Challenges, and Best Practices

    what is research data repository

  4. A Guide to Research Data Management

    what is research data repository

  5. Coming of Age: The Online Research Data Repository -- Campus Technology

    what is research data repository

  6. Research Data Repository Registration Workflow of re3data.org

    what is research data repository

VIDEO

  1. Research Repositories for UX Benchmarking Studies

  2. Build your own Research Data Repository Experience from PARADISEC Webinar 11 06 2020

  3. Academia is BROKEN! Harvard Fake Cancer Research Scandal Explained

  4. Launch of Research Data Repository

  5. 4TU.ResearchData Repository

  6. Python for Data Analysis: Built-in Data Structures, Functions, and Files: Part 2 (py4da02 3)

COMMENTS

  1. What is a Research Repository? Benefits and Uses

    A research repository acts as a centralized database where information is gathered, stored, analyzed, and archived in one organized space. In this single source of truth, raw data, documents, reports, observations, and insights can be viewed, managed, and analyzed. This allows teams to organize raw data into themes, gather actionable insights ...

  2. Data Repositories

    Data Repositories. A key aspect of data sharing involves not only posting or publishing research articles on preprint servers or in scientific journals, but also making public the data, code, and materials that support the research. Data repositories are a centralized place to hold data, share data publicly, and organize data in a logical manner.

  3. What is a Data Repository? (Definition, Examples, & Tools)

    A data repository is also known as a data library or data archive. This is a general term to refer to a data set isolated to be mined for data reporting and analysis. The data repository is a large database infrastructure — several databases — that collect, manage, and store data sets for data analysis, sharing and reporting.

  4. Data Repository: What it is, Types and Guide

    A data repository is a data library or data archive. It may be referred to large database management systems or several databases that collect, manages, and store sensitive data sets for data analysis, sharing, and reporting. Authorized users can easily access and retrieve data by using query and search tools, which helps with research and ...

  5. Finding Datasets, Data Repositories, and Data Standards

    Mendeley Data: A data index and open research data repository from publisher Elsevier where users can search across research data from 2000+ generalist and domain-specific repositories.. Filter results by date range, data type, source type (article or data repository), and source. Generalist Repositories. Here's a closer look at a few major cross-disciplinary repositories highlighted on the ...

  6. Research Data Repositories: Finding and Storing Data

    A data repository is a storage space for researchers to deposit data sets associated with their research. And if you're an author seeking to comply with a journal or funder data sharing policy, you'll need to identify a suitable repository for your data. An open access data repository openly stores data in a way that allows immediate user ...

  7. Research Data Repositories: Home

    A research data repository is a virtual place to store and preserve research data. Depositing research data in a repository increases data transparency and exposure and promotes research collaboration opportunities. There are multidisciplinary, subject-based, and special purpose repositories available to researchers worldwide.

  8. What a difference a data repository makes: Six ways depositing data

    Data is key to verification, replication, reuse, and enhanced understanding of research conclusions. When your data is in a repository—instead of an old hard drive, say, or even a Supporting Information file—its impact and its relevance are magnified. Here are six ways that putting your data in a public data repository can help your research go further.

  9. A Definitive Guide to Research Repositories (With Examples)

    A research repository is a tool that professional user experience (UX) designers use to organize research across multiple professionals. A research repository handles two functions within an organization: growing the awareness of how user experience is important to leadership, product owners and organizations and supporting designers through ...

  10. Understanding and using data repositories

    A data repository is a storage space for researchers to deposit data sets associated with their research. And if you're an author seeking to comply with a journal data sharing policy, you'll need to identify a suitable repository for your data. An open access data repository openly stores data in a way that allows immediate user access to ...

  11. Sharing research data for journal authors

    Mendeley Data is a certified, free-to-use repository that hosts open data from all disciplines, whatever its format (e.g. raw and processed data, tables, codes and software). With many Elsevier journals, it's possible to upload and store your data to Mendeley Data during the manuscript submission process.

  12. How to build a research repository: a step-by-step guide to ...

    Revisit the past 6 months of research and capture the data we want to add to our repository as an initial body of knowledgeCreate the first draft taxonomy for our research repository, testing this with a small group of wider stakeholdersLaunch the repository with an initial body of knowledge to a group of wider repository champions

  13. Research data repositories

    Research data repositories provide the best option for storing and publishing research data in the long term. Specific repositories may be recommended by funders or publishers, while some funders operate data centres for the research they fund. Most repositories have embargo arrangements to control data access if required, with similar ...

  14. Data Repository Explained in 5 Minutes

    A data repository is a library or archive that contains data to support analysis and reporting functions in research or business operations. In practice, a data repository is a general term that refers to the centralized location where data is stored. It can refer to a single storage device or a set of databases spanning across different devices.

  15. Harvard Dataverse

    Harvard Dataverse is an online data repository where you can share, preserve, cite, explore, and analyze research data. It is open to all researchers, both inside and out of the Harvard community. Harvard Dataverse provides access to a rich array of datasets to support your research. It offers advanced searching and text mining in over 2,000 ...

  16. How to choose a suitable data repository for your research data

    What is a data repository? According to the Registry of Research Data Repositories (re3data.org) — a global registry of research data repositories — a repository is an online storage infrastructure for researchers to store data, code, and other research outputs for scholarly publication.Research data means information objects generated by scholarly projects for example, through experiments ...

  17. What Is a Data Repository? [+ Examples and Tools]

    A data repository is a data storage entity in which data has been isolated for analytical or reporting purposes. Since it provides long-term storage and access to data, it is a type of sustainable information infrastructure. While commonly used for scientific research, a data repository can also be used to manage business data.

  18. Research Repositories for Tracking UX Research and Growing Your ResearchOps

    A research repository is a shared collection of UX-research-related elements that should support the following functions at the organization level: grow UX awareness and participation in UX work among leadership, product owners, and the organization at large. support UX research work, so UX professionals may be more productive as they plan and ...

  19. 6 Repositories to Share Research Data

    2. Mendeley Data. Mendeley Data is an open research data repository, where researchers can store and share their data. Datasets can be shared privately between individuals, as well as publicly with the world. Mendeley's mission is to facilitate data sharing. In their own words, "when research data is made publicly available, science benefits ...

  20. Open research data repositories: Practices, norms, and metadata for

    In addition, there is a lack of detailed knowledge about what research data repositories entail, especially when it comes to sharing images as data. In this study, which is part of a larger research project on the development of digital sharing practices in the visual cultural heritage, we focus on the data sharing opportunities for researchers ...

  21. Data Repository Guidance

    Data Repository Guidance. Scientific Data mandates the release of datasets accompanying our Data Descriptors, but we do not ourselves host data. Instead, we ask authors to submit datasets to an ...

  22. Research Data Management (RDM): Data Repositories

    Figshare is an international cloud-based Research Data Repository in support of the dissemination phase of the Research Data Management life cycle. This data repository facilitates data publishing, sharing and collaboration of academic research, allowing UP to manage and in some cases showcase its data to the wider research community. 4.

  23. Dryad Data Repository for UW Researchers

    Guide of resources related to the many aspects of research data management. Data management encompasses the processes surrounding collecting, organizing, describing, sharing, and preserving data. ... Dryad is a nonprofit data repository offering researchers a secure location for research data storage. UW is now a Dryad member and UW researchers ...

  24. 302 Found

    Found. The document has moved here.

  25. PHI Download

    PHI Download. STARR Tools allow you to download data files for up to 7500 patients to your desktop. Once downloaded, however, these files pose a significant data privacy risk, since it can be easy to forget that they are confidential and must be handled at all times in accordance with the language in the IRB used to generate them.