data warehousing Recently Published Documents

Total documents.

  • Latest Documents
  • Most Cited Documents
  • Contributed Authors
  • Related Sources
  • Related Keywords

TinyLFU-Based Semi-Stream Cache Join for Near-Real-Time Data Warehousing

Abstract Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of $R$ in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.

Large Scale System for Social Media Data Warehousing

Social media data become an integral part in the business data and should be integrated into the decisional process for better decision making based on information which reflects better the true situation of business in any field. However, social media data are unstructured and generated in very high frequency which exceeds the capacity of the data warehouse. In this work, we propose to extend the data warehousing process with a staging area which heart is a large scale system implementing an information extraction process using Storm and Hadoop frameworks to better manage their volume and frequency. Concerning structured information extraction, mainly events, we combine a set of techniques from NLP, linguistic rules and machine learning to succeed the task. Finally, we propose the adequate data warehouse conceptual model for events modeling and integration with enterprise data warehouse using an intermediate table called Bridge table. For application and experiments, we focus on drug abuse events extraction from Twitter data and their modeling into the Event Data Warehouse.

Understanding the Concept of Data Warehousing and Challenges in Its Implementation

The aim of this paper is to understand the concept of Data ware housing and how it is implemented. It is related to the data analysis of the data in an organisation. It facilitates and makes the analysis process easy for the workers of the organisation. The paper will also explain two approaches that are followed in data ware housing. The process of implementation of data ware house will also discussed further in this paper. There are certain challenges to create data ware house.

An Innovative Method to Extract Data in a Real-time Data Warehousing Environment

ETL (Extract, Transform, and Load) is an essential process required to perform data extraction in knowledge discovery in databases and in data warehousing environments. The ETL process aims to gather data that is available from operational sources, process and store them into an integrated data repository. Also, the ETL process can be performed in a real-time data warehousing environment and store data into a data warehouse. This paper presents a new and innovative method named Data Extraction Magnet (DEM) to perform the extraction phase of ETL process in a real-time data warehousing environment based on non-intrusive, tag and parallelism concepts. DEM has been validated on a dairy farming domain using synthetic data. The results showed a great performance gain in comparison to the traditional trigger technique and the attendance of real-time requirements.

Data Warehousing for Formula One (Racing) Popularity Rating Using Pentaho Tools

A framework for developing an enterprise data warehousing solution, developing a corporate data warehousing strategy, mapping the road to elimination: a 5-year evaluation of implementation strategies associated with hepatitis c treatment in the veterans health administration.

Abstract Background While few countries and healthcare systems are on track to meet the World Health Organization’s hepatitis C virus (HCV) elimination goals, the US Veterans Health Administration (VHA) has been a leader in these efforts. We aimed to determine which implementation strategies were associated with successful national viral elimination implementation within the VHA. Methods We conducted a five-year, longitudinal cohort study of the VHA Hepatic Innovation Team (HIT) Collaborative between October 2015 and September 2019. Participants from 130 VHA medical centers treating HCV were sent annual electronic surveys about their use of 73 implementation strategies, organized into nine clusters as described by the Expert Recommendations for Implementing Change taxonomy. Descriptive and nonparametric analyses assessed strategy use over time, strategy attribution to the HIT, and strategy associations with site HCV treatment volume and rate of adoption, following the Theory of Diffusion of Innovations. Results Between 58 and 109 medical centers provided responses in each year, including 127 (98%) responding at least once, and 54 (42%) responding in all four implementation years. A median of 13–27 strategies were endorsed per year, and 8–36 individual strategies were significantly associated with treatment volume per year. Data warehousing, tailoring, and patient-facing strategies were most commonly endorsed. One strategy—“identify early adopters to learn from their experiences”—was significantly associated with HCV treatment volume in each year. Peak implementation year was associated with revising professional roles, providing local technical assistance, using data warehousing (i.e., dashboard population management), and identifying and preparing champions. Many of the strategies were driven by a national learning collaborative, which was instrumental in successful HCV elimination. Conclusions VHA’s tremendous success in rapidly treating nearly all Veterans with HCV can provide a roadmap for other HCV elimination initiatives.

Explicitly Disclosing Clients Illness Catalogue Using Data Science Techniques

Abstract: Across the world in our day-to-day life, we come across various medical inaccuracies caused due to unreliable patient’s reminiscence. Statistically, communication problems are the most significant aspect that hampers the diagnosis of patient’s diseases. So, this paper represents the best theoretical solution to achieve patient care in the most adequate way. In these pandemic days, the communication gap between the patient and the physician has begun to decline to a nominal level. This paper demonstrates a vital solution and a steppingstone to the complete digitalization of the client’s illness catalogue. To attain the solution in a specified manner we are using adverse pre-existential technologies like data warehousing, database management system, cloud computing, big data, etc. We also persistently maintain the most secure, impenetrable infrastructure enabling the client’s data privacy. Keywords: Illness catalogue, cloud computing, data warehousing, database management systems, big data.

Data Warehousing

Export citation format, share document.

Book cover

International Conference on Conceptual Modeling

ER 1998: Advances in Database Technologies pp 81–92 Cite as

Recent Advances and Research Problems in Data Warehousing

  • Sunil Samtani 7 ,
  • Mukesh Mohania 8 ,
  • Vijay Kumar 7 &
  • Yahiko Kambayashi 9  
  • Conference paper

564 Accesses

11 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1552))

In the recent years, the database community has witnessed the emergence of a new technology, namely data warehousing . A data warehouse is a global repository that stores pre-processed queries on data which resides in multiple, possibly heterogeneous, operational or legacy sources. The information stored in the data warehouse can be easily and efficiently accessed for making effective decisions. The On-Line Analytical Processing (OLAP) tools access data from the data warehouse for complex data analysis, such as multidimensional data analysis, and decision support activities. Current research has lead to new developments in all aspects of data warehousing, however, there are still a number of problems that need to be solved for making data warehousing effective. In this paper, we discuss recent developments in data warehouse modelling, view maintenance, and parallel query processing. A number of technical issues for exploratory research are presented and possible solutions are discussed.

  • Data Warehouse
  • Integrity Constraint
  • View Update

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Unable to display preview.  Download preview PDF.

S. Chaudhuri and U. Dayal. An overview of data warehousing and OLAP technology. In ACM SIGMOD Record , volume 26, pages 65–74. 1997.

Article   Google Scholar  

A. Datta, B. Moon, and H. Thomas. A case for parallelism in data warehousing and olap. Technical report, Dept. of MIS, University of Arizona, Tucson, AZ URL: http://loochi.bpa.arizona.edu , 1998.

Google Scholar  

A. Gupta and I. S. Mumick. Maintenance of materialized views: problems, techniques, and applications. IEEE Data Engineering Bulletin, Special Issue on Materialized Views and Warehousing , 18(2), 1995.

A. Gupta, I.S. Mumick, and K.A. Ross. Adapting materialized views after redefinitions. In Proc. ACM SIGMOD International Conference on Management of Data, San Jose, USA , 1995.

A. Gupta, I. S. Mumick, and V. S. Subrahmanian. Maintaining views incrementally. In Proc. ACM SIGMOD Int. Conf. on Management of Data , pages 157–166, 1993.

N. Huyn. Multiple view self-maintenance in data warehousing environ-ments. In To Appear in Proc. Intl Conf. on Very Large Databases , 1997.

R. Hull and G. Zhou. A framework for supporting data integration using the materialized and virtual approaches. In Proc. ACM SIGMOD Conf. On Management of Data , pages 481–492, 1996.

V. Küchenhoff. On the efficient computation of the difference between consecutive database states. In C. Delobel, M. Kifer, and Y. Masunaga, editors, Proc. Second Int. Conf. on Deductive Object-Oriented Databases , volume 566 of Lecture Notes in Computer Science, Springer-Verlag , pages 478–502. Springer-Verlag, 1991.

M. Mohania and G. Dong. Algorithms for adapting materialized views in data warehouses. In Proc. of International Symposium on Cooperative Database Systems for Advanced Applications, Kyoto, Japan , pages 62–69, 1996.

M. Mohania, S. Konomi, and Y. Kambayashi. Incremental maintenance of materialized views. In Proc. of 8 th International Conference on Database and Expert Systems Applications (DEXA ‘87) . Springer-Verlag, 1997.

M. Mohania. Avoiding re-computation: View adaptation in data warehouses. In Proc. of 13 th International Database Workshop, Hong Kong , pages 151–165, 1997.

Dallan Quass, Ashish Gupta, Inderpal Singh Mumick, and Jennifer Widom. Making views self-maintainable for data warehousing. In Proc. of International Conference on Parallel and Database Information Systems , 1996.

D. Quass and J. Widom. On-line warehouse view maintenance for batch updates. In Proc. ACM SIGMOD Int. Conf. on Management of Data , 1997.

K.A. Ross, D. Srivastava, and Sudarshan S. Materialized view mainte-nance and integrity constraint checking: Trading space for time. In Proc. ACM SIGMOD International Conference on Management of Data, Montreal, Canada , 1996.

Jennifer Widom. Research problems in data warehousing. In Proc. Fourth Intl. Conference on Information and Knowledge Management , 1995.

Download references

Author information

Authors and affiliations.

Dept. of Computer Science Telecommunications, University of Missouri-Kansas City, Kansas City, MO, 64110, USA

Sunil Samtani & Vijay Kumar

Advanced Computing Research Centre, School of Computer and Information Science, University of South Australia, Mawson Lakes, 5095, Australia

Mukesh Mohania

Department of Social Informatics, Kyoto University, Kyoto, 606-8501, Japan

Yahiko Kambayashi

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Graduate School of Informatics, Dept. of Social Informatics, Kyoto University, 606-8501, Yoshida Sakyo Kyoto, Japan

Dept. of Computer Science Clear Water Bay, Hong Kong University of Science and Technology, Hong Kong, China

Dik Lun Lee

School of Applied Science, Nanyang Technological University, N4-2A-12, Nanyang Avenue, 639798, Singapore

Ee-Peng Lim

School of Computer and Information Science The Levels Campus, University of South Australia, Mawson Lakes, 5095, S.A., Australia

Mukesh Kumar Mohania

Faculty of Science, Dept. of Information Sciences, Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, 112-8610, Tokyo, Japan

Yoshifumi Masunaga

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper.

Samtani, S., Mohania, M., Kumar, V., Kambayashi, Y. (1999). Recent Advances and Research Problems in Data Warehousing. In: Kambayashi, Y., Lee, D.L., Lim, EP., Mohania, M.K., Masunaga, Y. (eds) Advances in Database Technologies. ER 1998. Lecture Notes in Computer Science, vol 1552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-49121-7_7

Download citation

DOI : https://doi.org/10.1007/978-3-540-49121-7_7

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-540-65690-6

Online ISBN : 978-3-540-49121-7

eBook Packages : Springer Book Archive

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Am Med Inform Assoc
  • v.21(4); 2014 Jul

Clinical research data warehouse governance for distributed research networks in the USA: a systematic review of the literature

John h holmes.

1 Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA

Thomas E Elliott

2 University of Minnesota Medical School, HealthPartners Institute for Education and Research, Duluth, Minnesota, USA

Jeffrey S Brown

3 Harvard Medical School Department of Population Medicine, Boston, Massachusetts, USA

Marsha A Raebel

4 Kaiser Permanente Colorado Institute for Health Research, Denver, Colorado, USA

Arthur Davidson

5 Denver Health and Hospital Authority, Denver, Colorado, USA

Andrew F Nelson

6 HealthPartners Institute for Education and Research, Minneapolis, Minnesota, USA

Annie Chung

Pierre la chance.

7 Research Informatics, Center for Health Research, Kaiser Permanente, Portland, Oregon, USA

John F Steiner

8 Kaiser Permanente Colorado, Denver, Colorado, USA

Associated Data

To review the published, peer-reviewed literature on clinical research data warehouse governance in distributed research networks (DRNs).

Materials and methods

Medline, PubMed, EMBASE, CINAHL, and INSPEC were searched for relevant documents published through July 31, 2013 using a systematic approach. Only documents relating to DRNs in the USA were included. Documents were analyzed using a classification framework consisting of 10 facets to identify themes.

6641 documents were retrieved. After screening for duplicates and relevance, 38 were included in the final review. A peer-reviewed literature on data warehouse governance is emerging, but is still sparse. Peer-reviewed publications on UK research network governance were more prevalent, although not reviewed for this analysis. All 10 classification facets were used, with some documents falling into two or more classifications. No document addressed costs associated with governance.

Even though DRNs are emerging as vehicles for research and public health surveillance, understanding of DRN data governance policies and procedures is limited. This is expected to change as more DRN projects disseminate their governance approaches as publicly available toolkits and peer-reviewed publications.

Conclusions

While peer-reviewed, US-based DRN data warehouse governance publications have increased, DRN developers and administrators are encouraged to publish information about these programs.

Background and significance

An enterprise data warehouse presents opportunities to conduct previously impractical studies of rare exposures or outcomes where very large sample sizes are needed, such as population-based surveillance, treatment safety, or comparative effectiveness research. 1 However, even a large healthcare organization may have insufficient subjects to support such studies. Increasingly, researchers are turning to distributed research networks (DRNs), which provide access to health-related data from multiple organizations. These data include, but are not limited to, clinical, laboratory, pharmacy, and procedure data and may be collected in outpatient and inpatient settings. In a DRN, the input is a user-generated query which may be posed as a natural-language request, a structured request thorough a web-based form, or program code. The output could be aggregated counts, statistical graphics, or de-identified individual-level data. This approach helps protect patient privacy and confidentiality, and addresses the proprietary concerns of the enterprise itself.

DRNs typically include a virtual repository or warehouse 2 3 and a distributed communication model. Data from multiple sources reside on local servers and authorized users obtain access using agreed-upon principles through a single, secure portal and query system as though concentrated in a single, unified resource. 1 4–9 Figure 1 illustrates a generic DRN.

An external file that holds a picture, illustration, etc.
Object name is amiajnl-2013-002370f01.jpg

A simple distributed research network. In this schematic, a researcher poses a research question to a portal, typically implemented as a web site with a structured interface that guides the construction of a query. The query is then sent to the participating sites in a predetermined format and language, such as SAS code. The sites run the query and return the result to the portal for use by the researcher, formatted as an aggregated table or de-identified record-level data, depending on the governance policies of the distributed research network.

The HMO Research Network's (HMORN's) virtual data warehouse (VDW) is an example of such a resource. We use VDW as a generic term here to represent the virtual data repository used by DRNs. In a VDW, data are standardized based on a common data model that enforces uniform data element naming conventions, definitions, and data storage formats. 1 10–13 Both single-use 14–16 and multi-use 6 12 15 17 networks have been created.

The DRN model imposes many governance challenges. 18 Data governance has been defined as ‘the high level, corporate, or enterprise policies or strategies that define the purpose for collecting data, and intended use of data’ 13 or more specifically, ‘the process by which responsibilities of stewardship are conceptualized and carried out,’ where such stewardship may include methods for acquiring, storing, aggregating, de-identifying, and releasing data for use. 10 Data governance within DRNs must address regulations and policies established at institution, network, and/or federal levels. Recognizing the need for DRN standards and governance to protect information originating in routine patient care, the federal Query Health Initiative 19 seeks to develop and implement standards for ‘distributed population health queries to certified electronic health records.’ 20

We conducted a systematic review of the indexed, peer-reviewed literature on DRN data governance. We were interested in the following questions: How are DRN data made available to researchers? What data standards are used in the DRN? Who can query such data? Who can access query results? What specific policies govern the use, security, and retention of these data and query results? How is data governance evaluated? Finally, what procedures have been defined for training users of DRN resources?

Search strategy

We searched PubMed, PubMed Central, EMBASE, CINAHL, and INSPEC for documents published through July 31, 2013. We included original English-language research articles, reviews, and indexed conference papers and abstracts that described DRN data governance. We excluded documents describing networks outside of the USA due to regulatory differences. With the exception of technical reports, gray (unpublished) literature, was excluded, as were editorials. We used the search terms shown in box 1 , expanded as indicated by the truncation (‘$’) character.

Box 1 Search terms

‘data govern$’

‘distributed research network$’

‘distributed research’

‘distributed network$’

‘research network$’

‘multi-institutional research’

‘data’ AND ‘govern$’

‘data’ AND ‘research network$’ AND ‘govern$’

A document was defined as relevant if it contained information about multi-institutional research data, research networks, and governance. Primary documents were examined for additional relevant documents that were also reviewed and added into the analytic corpus.

Analytic strategy

We based our analysis on a faceted classification framework, derived first deductively using the ‘10 Universal Components of a Data Governance Program’ (DGI Data Governance Framework, http://www.datagovernance.com/dgi_framework.pdf ) as a high-level taxonomy (see online supplementary appendix 1). Other governance frameworks were not determined to be suitable for our analysis. We then enriched this taxonomy with concepts that emerged in our corpus. These concepts, or facets, are shown in box 2 , with reference to the Data Governance Institute (DGI) framework component(s).

Box 2 Coding tree used to classify documents

Numbers in parentheses refer to the source component of the DGI Framework

Data collation (3)

Data and process standards (1, 3, 5)

Data stewardship (1, 4, 5, 6, 8, 9, 10)

Data privacy (3,6)

Query alignment and approval (3, 5, 9)

Data use (1, 4, 7, 9)

Data security (3, 6, 10)

Data retention (1, 2, 3, 5, 6, 7, 9)

Data audits (2, 3, 4, 10)

User training (7)

Using the coding tree, two coders (AC and JHH) classified each document. Since we used a faceted classification approach, documents were not restricted to only one category. The two coders compared their classifications and resolved any discrepancies by consensus.

This study was approved by the Kaiser Permanente Colorado (KPCO) Institutional Review Board.

Our search retrieved 6641 documents. After screening for duplicates and relevance, 39 were included in the final review. Figure 2 details the document retrieval process.

An external file that holds a picture, illustration, etc.
Object name is amiajnl-2013-002370f02.jpg

Flow of document retrieval.

Table 1 provides citations for the 39 documents in the final corpus, ordered by first author, with the facets they cover.

Table 1

List of documents contained in the final corpus for review, with their contributions, classified according to the faceted framework used in this review

Facet 1: Data collation

Data collation refers to an organization's policies and procedures pertaining to assembling data specifically for research purposes.

Data sources include electronic medical records, 16 21–24 pharmacy and laboratory databases, 23 administrative billing claims, 1 3–6 8 9 and health plan enrollment data. 12 16 21 23 25 The wide variety of sources poses challenges for data collation. Data represent different concept domains (such as drugs, vital signs, or claims), and are syntactically and semantically heterogeneous. For example, body temperature might be represented at one site using Fahrenheit and at another using Celsius. Standards are required for successful data collation. Policies and procedures for addressing these standards were discussed in documents considered in facet 2.

Facet 2: Data and process standards

A data standard promotes syntactical and semantic consistency by enforcing a pre-determined set of data representation requirements for each DRN site. A process standard refers to the format, language, and content of queries, data models, and processes that affect DRN operation. Both standards are important for interoperability, data capture and accuracy, and analysis. Several articles described these attributes. 2 4 8 13 24 How these standards are created and enforced varies, however. In some cases, a coordinating center develops data standards that all participating sites uphold, while in others data standards are adapted to a common data model that applies to all sites. 26 Some DRNs enforce consistency by providing standards for queries that generate results in a common format and that meet system and resource requirements. 1 3 8 27 The HMORN established a VDW Operational Committee that has a working group which is responsible for overseeing data and process standards. 28

A paper on the Mini-Sentinel Common Data Model (MSCDM) mentioned that partners were surveyed to determine what data formats should be included. 4 The Cancer Research Network created a single data dictionary that ‘guides the assemblage of standardized site-specific databases in each organization.’ 3 In the case of the Cardiovascular Research Network, 21 all data are structured into a standardized format in a VDW. This is comprised of: (1) datasets stored behind separate security firewalls at each site including identical variable definitions, labels, coding, and definitions; (2) informatics tools that facilitate data storage, retrieval, processing, and management; and (3) regularly updated documentation of all data elements.

Facet 3: Data stewardship

Data stewardship refers to the way results are curated at local and requesting sites. It involves oversight from legal, auditing, and compliance departments, executive leadership, and institutional review boards. Bloomrosen considered stewardship as central to data governance. 29 In a DRN, where results are transferred outside local institutions, it is often difficult to determine who owns these results. Decisions about data ownership and stewardship affect data accessibility by those outside the contributing organization, even if they are DRN members. The Wisconsin Network for Health Research (WiNHR) established a central authority to govern ownership and stewardship concerns. 26 All institutions in this network are represented on this committee, and have equal participation and authority in promulgating policies and procedures for stewardship.

One key benefit of a DRN is that participating sites retain local control of their data. Most DRNs considered here store their data behind local firewalls and have site-specific data protection, access, and privacy policies. 1 4 9 11 21–23 25 As mentioned by Forrow 5 and by Curtis, 4 the Mini-Sentinel Network complies with the standards imposed by the US Federal Information Security Management Act of 2002 and the HIPAA Security Rule. To this end, Lazarus 22 notes that local information services staff need to check that there are no ‘backdoors’ that could compromise system security.

Permission to query data in a DRN is governed by the purpose of access and use and by authentication and authorization policies contained in data use agreements. McMurry developed a Distributed Access Control Framework for this purpose. 7 This system records an audit trail of the identities of the investigator and agency and the time of query. This allows data partners to challenge queries and/or deny access. Shapiro created a real-time system to certify prospective data partners’ credentials. 16 Mini-Sentinel policies note that sites may use their own data for any purpose they deem appropriate, but written approval from each participating partner is required for any use of network data for other purposes. 4

Facet 4: Data privacy

The tension between protecting both patient and organizational privacy and confidentiality, and the need to use clinical and administrative data for research is exacerbated by the HIPAA Privacy Rule. Several DRNs have data access review committees that review proposed secondary uses of data for research. 7 15 17 One group, albeit outside the context of a DRN, has developed a statistical method for releasing secondary data without compromising patient privacy. 30 The HMORN has adopted a streamlined procedure for institutional review board (IRB) review across the network, 31 as well as a SAS macro that identifies protected health information before data are released to requesters. 32

Most DRNs require that transmitted data be de-identified. Parwani 24 and Patel 2 use ‘honest brokers,’ third parties pre-approved by the DRN's responsible IRB, to de-identify medical record information through automated or manual methods. Only the honest brokers have access to the linkage codes between data and identifier. Local pre-processing of protected health information to avoid its transfer is mentioned in several publications but details are lacking. 7 12 22

IRB oversight is not required for public health surveillance activities. The Privacy Rule permits the disclosure of protected health information if the organization tracks such disclosures. 5

Facet 5: Query alignment and approval

Data queries should be approved by the data providers to ensure alignment with privacy protections and available resources. This is often accomplished through a portal that restricts queries to a pre-determined set of data elements. The Query Execution Manager is an example of an asynchronous ‘pull’ approach, which incorporates data providers in the query approval and execution process. 11 In the ‘pull’ approach, programmer-analysts and/or investigators at participating sites receive and review a new query, and decide whether to run it against their local data. The queries are accessed through a web portal, encrypted email, or similar interface. The encrypted results are uploaded back to the hub or original requestor, usually in delimited (csv) or SAS files. 1 4 7 11 25 Other DRNs that use this approach include the Mini-Sentinel Network 4 and the Nationwide Health Information Network (Healtheway). 7

Some systems allow researchers within the network to query local data synchronously. Harvard's Shared Health Research Information Network (SHRINE) is one example. 33 In this ‘push’ query type, the query is directly processed by the remote query sender. In contrast, the HMORN does not permit a researcher external to the organization to directly submit queries to local data, but an external researcher may be sent a study dataset under an IRB-approved protocol. Portal approaches taken by these and others 2 24 also include ensuring that the user is authorized to request the data specified in the query. Several publications elaborated on the tools and format that DRNs used to conduct data queries. 1 4 7 11 25

Facet 6: Data use

Data use refers to the purposes for which data are requested, accessed, and analyzed. These activities fall into three categories: preparatory to research (PTR), subsequent to obtaining IRB approval of a research protocol such as cohort identification for descriptive and multivariable analyses, and public health surveillance. PTR activities include queries that return aggregated counts or de-identified datasets which contain only aggregated count data, typically to assess the feasibility of a study or to develop sample size calculations.

In contrast, a limited dataset is often required for cohort identification and descriptive statistical analyses. However, in a distributed network, it is difficult to create the single, observation-level dataset required for multivariable analysis. For such analyses, sites may create a pooled analysis dataset, perhaps containing covariance matrices obtained from running separate regression analyses at each site, which are then combined for further regression analysis. 8 34 Methods for accomplishing this more easily are under development. 35 36

Facet 7: Data security

Several documents described policies or procedures for secure transmission and storage of results through virtual private networks, data encryption, firewalls, and password protection. These networks included the Cancer Research Network, 17 Bioterrorism Syndromic Surveillance Demonstration Program, 22 and Mini-Sentinel Network. 4–6 Each included architectural as well as procedure information. Password protection for access to query software was mentioned only in Patel 2 and Parwani 24 ; these two networks (the Pennsylvania Cancer Alliance Bioinformatics Consortium and the Early Detection Research Network colorectal and pancreatic neoplasm virtual biorepository) utilize a centralized database, in contrast to other DRNs.

Facet 8: Data retention

As Willison mentioned, data retention should be a concern among partners in a research network. 37 Only one document in our corpus mentioned procedures for data retention. McGraw states that ‘data partners are required to keep the information that has been transformed into MSCDM and used to respond to queries for 3 years.’ 6 If additional data are needed in the case of a suspected safety signal, the data partner is ‘expressly limited to collecting additional data solely for the purpose of confirming the signal—the data must be destroyed within 3 years according to national standards for data destruction.’ 6

Facet 9: Data audits

Data audits are performed to evaluate information system and data integrity, identify unauthorized system access, and ensure that data are appropriately collected and represented. In any healthcare or health research context, data audits are required under Section 13411 of the HITECH Act. In a DRN, data audits also ensure that data are used within approved research protocols.

Several DRNs in our review have well-defined auditing functions. The Cancer Research Network has a central auditing authority that ensures that each participating institution has the technical support for maintaining security and privacy logs. Auditing cannot be left solely to the local level to address systematic security and privacy issues, but local sites may add auditing procedures. 17 In the Nationwide Health Information Network, the system logs the identity of the requestor, the identity of the agency that certified the investigation, and the time of query. This audit trail allows data providers to identify controversial credentialing and challenge agencies’ queries and deny access. 7 The Early Detection Research Network has an audit review system in which 5% of new entries are re-examined by honest brokers, the cancer registrar, and data managers. Findings and recommendations are submitted to the project coordinating committee. 24

Facet 10: User training

Training new users of any DRN is essential for ensuring adherence to policies, procedures, and standards. Our review of the literature revealed two documents where user training was described. In one, the HMORN analyzed past user experiences to assist with training. 38 39 In the other, drawing on the experiences of the HMORN and practice-based research networks leveraged by the Clinical Translational Science Award, researchers developed an extensive training resource, the Research Toolkit. 40 The Research Toolkit is a large repository of scholarly articles, IRB documents, and proposal development guides. Although not reviewed here, users should know that it contains a substantial amount of information about data governance as it applies to multi-site studies.

Our review identified practices of, and challenges posed by, the governance of clinical research DRN data warehouses. A recent review that focused on the growth of health information technology and particularly electronic medical records and their use in comparative effectiveness research, further highlighted these challenges. 41 42

The literature on DRN data warehouse governance is immature, with only 39 documents retrieved in a broad search of the biomedical and computer and information science literature. Only a few of the 20 Clinical Data Research Networks identified in a recent technical report 43 have published information about their data governance in the peer-reviewed literature. Of note, many more documents (N=183) describing non-US systems were retrieved, primarily describing DRNs and related systems in the UK. Much is still to be to learned about the challenges posed for data warehouse governance for DRNs in the USA. For example, research is a small component of managed care organizations (MCOs), and research within the MCO is often dominated by day-to-day organizational and financial demands. A research advocate should be involved in organizational decisions to ensure that researchers can take advantage of in-house expertise, such as data specialists, governance experts, and regulatory compliance professionals.

Several additional implications and recommendations emanate from our analysis. First, researchers and public health surveillance experts should develop standard operating procedures for safeguarding data, conduct periodic compliance audits, and provide educational and technical support to facilitate uptake procedures. Following the lead of the PRIMER project, 44 procedures should be documented and published so that they may be evaluated and used by others. Second, codification of DRN data warehouse governance policies and procedures should be a priority as the DRN is designed and implemented and revised periodically as new demands arise. A meta-policy should be in place that provides oversight and approval by representatives across the DRN. Third, an independent oversight function within the DRN should review the data and processes to foster trust among data contributors. Fourth, a shortcoming of the literature is that costs associated with data warehouse governance have not been addressed.

The framework we used in our analytic review of the literature is but one of several. We used the framework we deemed most amenable to modification for the DRN context. However, a new framework or taxonomy could be developed specifically for the DRN community to use in evaluating governance as the DRN area evolves. For example, the Scalable PArtnering Network for Comparative Effectiveness Research (SPAN), 45 a DRN with 11 participating sites, has begun a framework for the DRN community to use that is detailed in ‘The SPAN: Purpose, Structure, and Operations’ document, posted in the AcademyHealth Repository ( http://repository.academyhealth.org/govtoolkit/3/ ). Although not meeting our corpus inclusion criteria, this document provides the SPAN governance guidelines. Finally, few of the documents in our corpus described policies for complying with HIPAA or IRB requirements, and we recommend that identifying and cataloging these policies should be undertaken in a comprehensive study that includes the gray literature.

Above all, it is important to consider that the DRN data warehouse governance is highly specific to the partnering institutions, the target research domain(s), and the network user community. Furthermore, numerous DRNs were not represented in our corpus because no indexed literature was available for them. The recent compendium of research networks provided by Ohno-Machado et al 43 is an excellent resource for those seeking to understand their function.

Limitations

Much governance documentation resides in the gray literature, such as web sites and industry white papers. The primary limitation of our review is reliance on the indexed scientific literature. We chose to restrict our document corpus to this literature because it has undergone peer review and includes reports of data warehouse governance specifically in the DRN domain. Few DRNs have published materials about data governance, and the seeming dominance of the HMORN and Mini-Sentinel Network in our review reflects the fact that they have published relatively extensively. We stress here that this review is intended as a starting point for those working in the area of data governance and DRNs. A more comprehensive review of data governance policies and procedures will require a much larger study involving detailed primary data collection from all types of research data networks.

As we develop data resources to support a learning health system, 46 a consistent framework is necessary to govern an increasingly networked environment. Making sure that clinical research DRNs are properly governed will increase public trust and limit risk to, and encourage greater participation by, those holding primary data sources.

Clinical research DRN data warehouse governance policies provide important protections beyond data infrastructure and security. Articulating written governance agreements assists in developing and maintaining a common vision and purpose within the DRN, fosters trust and collaboration across the DRN data providers, and provides a template for addressing issues as they arise. Researchers planning to implement or improve existing data warehouse governance for DRNs need better guidance from the literature. However, as our review suggests, the dearth of DRN governance documents in the peer-reviewed literature indicates that this might not be the appropriate venue for publishing governance policies due to an inability to pass peer review, or be compatible with journal scope or editorial policy. This poses substantial difficulties for the informatics and clinical research communities as we move forward to a more distributed research environment. We thus encourage DRNs to publish information on publicly available websites about their data warehouse governance programs used to support DRNs and to develop and publish metrics that can be used to assess the impact of network governance on the efficiency of research and the protection of patients and participating organizations.

Supplementary Material

Contributors: JHH, TEE, AFN, MAR, JFS, AD, and PLC contributed to conceptualization of the project. AC and JHH performed the searches and collected the data. JHH, AC, TEE, MAR, and AD performed data analysis. JHH, TEE, AFN, MAR, JFS, AD, PLC and AC participated in writing and editing the manuscript.

Funding: This article was prepared by the Scalable PArtnering Network (SPAN) for Comparative Effectiveness Research (CER), supported by grant number R01HS019912 from the Agency for Healthcare Research and Quality. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality.

Competing interests: None.

Provenance and peer review: Not commissioned; externally peer reviewed.

Data warehouse architecture and design

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

recent research paper on data warehouse

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Data Warehouse

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Save to Library
  • Last »
  • Data Warehousing Follow Following
  • Bencana Follow Following
  • Pemetaan Follow Following
  • Decision support system Follow Following
  • Business Intelligence Follow Following
  • Soft Computing Follow Following
  • Data Mining Follow Following
  • Data Warehouse Testing Follow Following
  • Virtual Data Warehouse Follow Following
  • Data Warehousing and Data Mining Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Publishing
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Help | Advanced Search

Computer Science > Computation and Language

Title: realm: reference resolution as language modeling.

Abstract: Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an extremely effective system to resolve references of various types, by showing how reference resolution can be converted into a language modeling problem, despite involving forms of entities like those on screen that are not traditionally conducive to being reduced to a text-only modality. We demonstrate large improvements over an existing system with similar functionality across different types of references, with our smallest model obtaining absolute gains of over 5% for on-screen references. We also benchmark against GPT-3.5 and GPT-4, with our smallest model achieving performance comparable to that of GPT-4, and our larger models substantially outperforming it.

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

You are using an outdated browser. Please upgrade your browser to improve your experience.

Apple AI research: ReALM is smaller, faster than GPT-4 when parsing contextual data

Wesley Hilliard's Avatar

Apple is working to bring AI to Siri

recent research paper on data warehouse

Artificial Intelligence research at Apple keeps being published as the company approaches a public launch of its AI initiatives in June during WWDC . There has been a variety of research published so far, including an image animation tool .

The latest paper was first shared by VentureBeat . The paper details something called ReALM — Reference Resolution As Language Modeling.

Having a computer program perform a task based on vague language inputs, like how a user might say "this" or "that," is called reference resolution. It's a complex issue to solve since computers can't interpret images the way humans can, but Apple may have found a streamlined resolution using LLMs.

When speaking to smart assistants like Siri , users might reference any number of contextual information to interact with, such as background tasks, on-display data, and other non-conversational entities. Traditional parsing methods rely on incredibly large models and reference materials like images, but Apple has streamlined the approach by converting everything to text.

Apple found that its smallest ReALM models performed similarly to GPT-4 with much fewer parameters, thus better suited for on-device use. Increasing the parameters used in ReALM made it substantially outperform GPT-4.

One reason for this performance boost is GPT-4's reliance on image parsing to understand on-screen information. Much of the image training data is built on natural imagery, not artificial code-based web pages filled with text, so direct OCR is less efficient.

Two images listing information as seen by screen parsers, like addresses and phone numbers

Converting an image into text allows ReALM to skip needing these advanced image recognition parameters, thus making it smaller and more efficient. Apple also avoids issues with hallucination by including the ability to constrain decoding or use simple post-processing.

For example, if you're scrolling a website and decide you'd like to call the business, simply saying "call the business" requires Siri to parse what you mean given the context. It would be able to "see" that there's a phone number on the page that is labeled as the business number and call it without further user prompt.

Apple is working to release a comprehensive AI strategy during WWDC 2024. Some rumors suggest the company will rely on smaller on-device models that preserve privacy and security, while licensing other company's LLMs for the more controversial off-device processing filled with ethical conundrums.

Top Stories

article thumbnail

Deals: grab Apple's latest 14-inch MacBook Pro with 36GB RAM for $2,199

article thumbnail

New iPads Pro, iPad Air, and USB-C accessories expected in early May

article thumbnail

Apple now allows classic game emulators on the App Store

article thumbnail

Thinnest iPhone 16 display bezels still a problem for OLED suppliers

article thumbnail

Two more new iPads spotted in regulatory database

article thumbnail

What to expect from Apple's Q2 2024 earnings on May 2

Featured deals.

article thumbnail

Save $400 on Apple's 15-inch MacBook Air with 24GB RAM, 2TB SSD

Latest comparisons.

article thumbnail

M3 15-inch MacBook Air vs M3 14-inch MacBook Pro — Ultimate buyer's guide

article thumbnail

M3 MacBook Air vs M1 MacBook Air — Compared

article thumbnail

M3 MacBook Air vs M2 MacBook Air — Compared

Latest news.

article thumbnail

Amazon, Best Buy battle for lowest iPad prices, with deals from $249

Amazon and Best Buy are blowing out iPad inventory, with prices as low as $249 and discounts of up to $150 off. Save on Apple's iPad 9th Generation, 10th Generation and the latest Air models during today's price war.

author image

Apple teases more Immersive Video dinosaurs for Apple Vision Pro coming soon

Apple will debut a second short film from its "Prehistoric Planet Immersive" series for Apple Vision Pro users on April 19th.

author image

Protesters close Chicago Apple Store over Palestinian employee firing

A small group of demonstrators disrupted the Lincoln Park Apple Store in Chicago on Saturday to protest Apple's use of labor in Africa and its disciplining of at least one employee wearing pro-Palestinian clothing items.

article thumbnail

Apple's future smart home ambitions leverage robotics, and go far beyond simple HomeKit lights

Apple is said to be considering future products that make it easier to consolidate information from other smart home devices, and a room assistant that could point a screen at you constantly, as you move around the space.

article thumbnail

Apple stores will reportedly prepare for a long-awaited update its iPad Pro and iPad Air models in the first week of May.

article thumbnail

Apple licenses millions of Shutterstock images to train its AI models

Apple has struck a deal to license millions of images from Shutterstock in order to train its AI models.

article thumbnail

Apple appeals US trade ban on Apple Watches

Apple has now argued before a US appeals court that the ITC's US ban on Apple Watches with the pulse oximetry feature should be overturned.

article thumbnail

Jony Ive is now looking for funding to jump on the AI development train

Former Apple head designer Jony Ive is looking for big money from known artificial intelligence venture capitalists to build his mysterious assistant product.

author image

A change to the App Store rules reverses a very old rule that prohibited emulators on the iPhone and iPad.

article thumbnail

Google's Apple-friendly Find My Devices network launching in April

Apple and Google have worked together to get an interoperability standard off the ground for tracking devices, and Google's Find My Devices network is ready to launch.

article thumbnail

Russian antitrust regulator asks Apple about banking apps while ignoring Ukraine war

Russia's Federal Antimonopoly Service has asked Apple why Russian users cannot access full banking and payment services, while seemingly ignoring how banks in the country were sanctioned over the Ukraine war.

Latest Videos

article thumbnail

How to turn off Apple's Journal 'Discoverable by Others' setting that's enabled by default

article thumbnail

The best Thunderbolt 4 docks and hubs you can buy for your Mac

article thumbnail

Apple Ring rumors & research - what you need to know about Apple's next wearable

Latest reviews.

article thumbnail

Journey Loc8 MagSafe Finder Wallet review: an all-in-one Find My wallet

article thumbnail

TP-Link Tapo Indoor cameras review: affordable HomeKit options with in-app AI tools

article thumbnail

ShiftCam LensUltra Deluxe Kit review: Upgrade your iPhone photo shooting game

article thumbnail

{{ title }}

{{ summary }}

author image

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 28 March 2024

New water accounting reveals why the Colorado River no longer reaches the sea

  • Brian D. Richter   ORCID: orcid.org/0000-0001-7216-1397 1 , 2 ,
  • Gambhir Lamsal   ORCID: orcid.org/0000-0002-2593-8949 3 ,
  • Landon Marston   ORCID: orcid.org/0000-0001-9116-1691 3 ,
  • Sameer Dhakal   ORCID: orcid.org/0000-0003-4941-1559 3 ,
  • Laljeet Singh Sangha   ORCID: orcid.org/0000-0002-0986-1785 4 ,
  • Richard R. Rushforth 4 ,
  • Dongyang Wei   ORCID: orcid.org/0000-0003-0384-4340 5 ,
  • Benjamin L. Ruddell 4 ,
  • Kyle Frankel Davis   ORCID: orcid.org/0000-0003-4504-1407 5 , 6 ,
  • Astrid Hernandez-Cruz   ORCID: orcid.org/0000-0003-0776-5105 7 ,
  • Samuel Sandoval-Solis 8 &
  • John C. Schmidt 9  

Communications Earth & Environment volume  5 , Article number:  134 ( 2024 ) Cite this article

9631 Accesses

751 Altmetric

Metrics details

  • Water resources

Persistent overuse of water supplies from the Colorado River during recent decades has substantially depleted large storage reservoirs and triggered mandatory cutbacks in water use. The river holds critical importance to more than 40 million people and more than two million hectares of cropland. Therefore, a full accounting of where the river’s water goes en route to its delta is necessary. Detailed knowledge of how and where the river’s water is used can aid design of strategies and plans for bringing water use into balance with available supplies. Here we apply authoritative primary data sources and modeled crop and riparian/wetland evapotranspiration estimates to compile a water budget based on average consumptive water use during 2000–2019. Overall water consumption includes both direct human uses in the municipal, commercial, industrial, and agricultural sectors, as well as indirect water losses to reservoir evaporation and water consumed through riparian/wetland evapotranspiration. Irrigated agriculture is responsible for 74% of direct human uses and 52% of overall water consumption. Water consumed for agriculture amounts to three times all other direct uses combined. Cattle feed crops including alfalfa and other grass hays account for 46% of all direct water consumption.

Similar content being viewed by others

recent research paper on data warehouse

Disappearing cities on US coasts

Leonard O. Ohenhen, Manoochehr Shirzaei, … Robert J. Nicholls

recent research paper on data warehouse

Meta-analysis shows the impacts of ecological restoration on greenhouse gas emissions

Tiehu He, Weixin Ding, … Quanfa Zhang

recent research paper on data warehouse

Irrigation-driven groundwater depletion in the Ganges-Brahmaputra basin decreases the streamflow in the Bay of Bengal

Fadji Z. Maina, Augusto Getirana, … Ravi Appana

Introduction

Barely a trickle of water is left of the iconic Colorado River of the American Southwest as it approaches its outlet in the Gulf of California in Mexico after watering many cities and farms along its 2330-kilometer course. There were a few years in the 1980s in which enormous snowfall in the Rocky Mountains produced a deluge of spring snowmelt runoff capable of escaping full capture for human uses, but for most of the past 60 years the river’s water has been fully consumed before reaching its delta 1 , 2 . In fact, the river was overconsumed (i.e., total annual water consumption exceeding runoff supplies) in 16 of 21 years during 2000–2020 3 , requiring large withdrawals of water stored in Lake Mead and Lake Powell to accommodate the deficits. An average annual overdraft of 10% during this period 2 caused these reservoirs– the two largest in the US – to drop to three-quarters empty by the end of 2022 4 , triggering urgent policy decisions on where to cut consumption.

Despite the river’s importance to more than 40 million people and more than two million hectares (>5 million acres) of cropland—producing most of the vegetable produce for American and Canadian plates in wintertime and also feeding many additional people worldwide via exports—a full sectoral and crop-specific accounting of where all that water goes en route to its delta has never been attempted, until now. Detailed knowledge of how and where the river’s water is used can aid design of strategies and plans for bringing water use into balance with available supplies.

There are interesting historical reasons to explain why this full water budget accounting has not been accomplished previously, beginning a full century ago when the apportionment of rights to use the river’s water within the United States was inscribed into the Colorado River Compact of 1922 5 . That Compact was ambiguous and confusing in its allocation of water inflowing to the Colorado River from the Gila River basin in New Mexico and Arizona 6 , even though it accounts for 24% of the drainage area of the Colorado River Basin (Fig.  1 ). Because of intense disagreements over the rights to the Gila and other tributaries entering the Colorado River downstream of the Grand Canyon, the Compact negotiators decided to leave the allocation of those waters rights to a later time so that the Compact could proceed 6 . Arizona’s formal rights to the Gila and other Arizona tributaries were finally affirmed in a US Supreme Court decision in 1963 that also specified the volumes of Colorado River water allocated to California, Arizona, and Nevada 7 . Because the rights to the Gila’s waters lie outside of the Compact allocations, the Gila has not been included in formal accounting of the Colorado River Basin water budget to date 8 . Additionally, the Compact did not specify how much water Mexico—at the river’s downstream end—should receive. Mexico’s share of the river was not formalized until 22 years later, in the 1944 international treaty on “Utilization of the Waters of the Colorado and Tijuana Rivers and of the Rio Grande” (1944 Water Treaty) 9 . As a result of these political circumstances, full accounting for direct water consumption at the sectoral level—in which water use is accounted according to categories such as municipal, industrial, commercial, or agricultural uses—has not previously been compiled for the Gila River basin’s water, and sectoral accounting for Mexico was not published until 2023 10 .

figure 1

The physical boundary of the Colorado River Basin is outlined in black. Hatched areas outside of the basin boundary receive Colorado River water via inter-basin transfers (also known as ‘exports’). The Gila River basin is situated in the far southern portion of the CRB in Arizona, New Mexico, and Mexico. Map courtesy of Center for Colorado River Studies, Utah State University.

The US Bureau of Reclamation (“Reclamation”)—which owns and operates massive water infrastructure in the Colorado River Basin—has served as the primary accountant of Colorado River water. In 2012, the agency produced a “Colorado River Basin Water Supply and Demand Study” 8 that accounted for both the sectoral uses of water within the basin’s physical boundaries within the US as well as river water exported outside of the basin (Fig.  1 ). But Reclamation did not attempt to account for water generated from the Gila River basin because of that sub-basin’s exclusion from the Colorado River Compact, and it did not attempt to explain how water crossing the border into Mexico is used. The agency estimated riparian vegetation evapotranspiration for the lower Colorado River but not the remainder of the extensive river system. Richter et al. 11 published a water budget for the Colorado River that included sectoral and crop-specific water consumption but it too did not include water used in Mexico, nor reservoir evaporation or riparian evapotranspiration, and it did not account for water exported outside of the Colorado River Basin’s physical boundary as illustrated in Fig.  1 . Given that nearly one-fifth (19%) of the river’s water is exported from the basin or used in Mexico, and that the Gila is a major tributary to the Colorado, this incomplete accounting has led to inaccuracies and misinterpretations of “where the Colorado River’s water goes” and has created uncertainty in discussions based on the numbers. This paper provides fuller accounting of the fate of all river water during 2000–2019, including averaged annual consumption in each of the sub-basins including exports, consumption in major sectors of the economy, consumption in the production of specific types of crops, and water consumed by reservoir evaporation and riparian/wetland evapotranspiration.

Rising awareness of water overuse and prolonged drought has driven intensifying dialog among the seven US states sharing the basin’s waters as well as between the United States, Mexico, and 30 tribal nations within the US. Since 2000, six legal agreements affecting the US states and two international agreements with Mexico have had the effect of reducing water use from the Colorado River 7 :

In 2001, the US Secretary of the Interior issued a set of “Interim Surplus Guidelines” to reduce California’s water use by 14% to bring the state within its allocation as determined in the 1963 US Supreme Court case mentioned previously. A subsequent “Quantification Settlement Agreement” executed in 2003 spelled out details about how California was going to achieve the targeted reduction.

In 2007, the US Secretary of the Interior adopted a set of “Colorado River Interim Guidelines for Lower Basin Shortages and the Coordinated Operations for Lake Powell and Lake Mead” that reduced water deliveries to Arizona and Nevada when Lake Mead drops to specified levels, with increasing cutbacks as levels decline.

In 2012, the US and Mexican federal governments signed an addendum to the 1944 Water Treaty known as Minute 319 that reduced deliveries to Mexico as Lake Mead elevations fall.

In 2017, the US and Mexican federal governments established a “Binational Water Scarcity Contingency Plan” as part of Minute 323 that provides for deeper cuts in deliveries to Mexico under specified low reservoir elevations in Lake Mead.i

In 2019, the three Lower Basin states and the US Secretary of the Interior agreed to commitments under the “Lower Basin Drought Contingency Plan” that further reduced water deliveries beyond the levels set in 2007 and added specifications for deeper cuts as Lake Mead drops to levels lower than anticipated in the 2007 Guidelines.

In 2023, the states of California, Arizona and Nevada committed to further reductions in water use through the year 2026 12 .

With each of the above agreements, overall water consumption has been reduced but many scientists assert that these reductions still fall substantially short of balancing consumptive use with 21st century water supplies 2 , 13 . With all of these agreements—excepting the Interim Surplus Guidelines of 2001—set to expire in 2026, management of the Colorado River’s binational water supply is now at a crucial point, emphasizing the need for comprehensive water budget accounting.

Our tabulation of the Colorado River’s full water consumption budget (Table  1 ) provides accounting for all direct human uses of water as either agricultural or MCI (municipal, commercial, industrial), as well as indirect losses of water to reservoir evaporation and evapotranspiration from riparian or wetland vegetation including in the Salton Sea and in a wetland in Mexico (Cienega de Santa Clara) that receives agricultural return flows from irrigated areas in Arizona. We explicitly note that all estimates represent consumptive use , resulting from the subtraction of return flows from total water withdrawals. Table  2 provides a summary based only on direct human uses and does not include indirect consumption of water. We have provided Tables  1 and 2 in English units in our Supplementary Information as Tables SI-1 and SI-2 . We have lumped municipal, commercial, and industrial (MCI) uses together because these sub-categories of consumption are not consistently differentiated within official water delivery data for cities utilizing Colorado River water. More detail on urban water use by cities dependent on the river is available in Richter 14 , among other studies.

We differentiated water consumption geographically using the ‘accounting units’ mapped in Fig.  2 , which are based on the Colorado River Basin map as revised by Schmidt 15 ; importantly, these accounting units align spatially with Reclamation’s accounting systems for the Upper Basin and Lower Basin as described in our Methods, thereby enabling readers accustomed to Reclamation’s water-use reports to easily comprehend our accounting. We have also accounted for all water consumed within the Colorado River Basin boundaries as well as water exported via inter-basin transfers. Water exported outside of the basin includes 47 individual inter-basin transfer systems (i.e., canals, pipelines, pumps) that in aggregate export ~12% of the river’s water. We note that the Imperial Irrigation District of southern California is often counted as a recipient of exported water, but we have followed the rationale of Schmidt 15 by including it as an interior part of the Lower Basin even though it receives its Colorado River water via the All American Canal (Fig.  2 ).

figure 2

The water budget estimates presented in Tables  1 and 2 are summarized for each of the seven “accounting units” displayed here.

These results confirm previous findings that irrigated agriculture is the dominant consumer of Colorado River water. Irrigated agriculture accounts for 52% of overall consumption (Table  1 ; Figs.  3 and 4 ) and 74% of direct human consumption (Table  2 ) of water from the Colorado River Basin. As highlighted in Richter et al. 11 , cattle-feed crops (alfalfa and other hay) are the dominant water-consuming crops dependent upon irrigation water from the basin (Tables  1 and 2 ; Figs.  3 and 4 ). Those crops account for 32% of all water consumed from the basin, 46% of all direct water consumption, and 62% of all agricultural water consumed (Table  1 ; Fig.  3 ). The percentage of water consumed by irrigated crops is greatest in Mexico, where they account for 86% of all direct human uses (Table  2 ) and 80% of total water consumed (Table  1 ). Cattle-feed crops consume 90% of all water used by irrigated agriculture within the Upper Basin, where the consumed volume associated with these cattle-feed crops amounts to more than three times what is consumed for municipal, commercial, or industrial uses combined.

figure 3

All estimates based on 2000–2019 averages. Both agriculture and MCI (municipal, commercial, and industrial) uses are herein referred to as “direct human uses.” “Indirect uses” include both reservoir evaporation as well as evapotranspiration by riparian/wetland vegetation.

figure 4

Water consumed by each sector in the Colorado River Basin and sub-basins (including exports), based on 2000–2019 averages.

Another important finding is that a substantial volume of water (19%) is consumed in supporting the natural environment through riparian and wetland vegetation evapotranspiration along river courses. This analysis—made possible because of recent mapping of riparian vegetation in the Colorado River Basin 16 —is an important addition to the water budget of the Colorado River Basin, given that the only previous accounting for riparian vegetation consumption has limited to the mainstem of the Colorado River below Hoover Dam and does not include vegetation upstream of Hoover Dam nor vegetation along tributary rivers 17 . Given that many of these habitats and associated species have been lost or became imperiled due to river flow depletion 18 —including the river’s vast delta ecosystem in Mexico—an ecologically sustainable approach to water management would need to allow more water to remain in the river system to support riparian and aquatic ecosystems. Additionally, 11% of all water consumed in the Colorado River Basin is lost through evaporation from reservoirs.

It is also important to note a fairly high degree of inter-annual variability in each sector of water use; for example, the range of values portrayed for the four water budget sectors shown in Fig.  5 equates to 24–47% of their 20-year averages. Also notable is a decrease in water consumed in the Lower Basin between the years 2000 and 2019 for both the MCI (−38%) and agricultural sectors (−15%), which can in part be attributed to the policy agreements summarized previously that have mandated water-use reductions.

figure 5

Inter-annual variability of water consumption within the Lower and Upper Basins, including water exported from these basins. The average (AVG) values shown are used in the water budgets detailed in Tables  1 and 2 .

The water accounting in Richter et al. 11 received a great deal of media attention including a front-page story in the New York Times 19 . These stories focused primarily on our conclusion that more than half (53%) of water consumed in the Colorado River Basin was attributable to cattle-feed crops (alfalfa and other hays) supporting beef and dairy production. However, that tabulation of the river’s water budget had notable shortcomings, as discussed previously. In this more complete accounting that includes Colorado River water exported outside of the basin’s physical boundary as well as indirect water consumption, we find that irrigated agriculture consumes half (52%) of all Colorado River Basin water, and the portion of direct consumption going to cattle-feed crops dropped from 53% as reported in Richter et al. 11 to 46% in this revised analysis.

These differences are explained by the fact that we now account for all exported water and also include indirect losses of water to reservoir evaporation and riparian/wetland evapotranspiration in our revised accounting, as well as improvements in our estimation of crop-water consumption. However, the punch line of our 2020 paper does not change fundamentally. Irrigated agriculture is the dominant consumer of water from the Colorado River, and 62% of agricultural water consumption goes to alfalfa and grass hay production.

Richter et al. 20 found that alfalfa and grass hay were the largest water consumers in 57% of all sub-basins across the western US, and their production is increasing in many western regions. Alfalfa is favored for its ability to tolerate variable climate conditions, especially its ability to persist under greatly reduced irrigation during droughts and its ability to recover production quickly after full irrigation is resumed, acting as a “shock absorber” for agricultural production under unpredictable drought conditions. The plant is also valued for fixing nitrogen in soils, reducing fertilizer costs. Perhaps most importantly, labor costs are comparatively low because alfalfa is mechanically harvested. Alfalfa is increasing in demand and price as a feed crop in the growing dairy industry of the region 21 . Any efforts to reduce water consumed by alfalfa—either through shifting to alternative lower-water crops or through compensated fallowing 20 —will need to compete with these attributes.

This new accounting provides a more comprehensive and complete understanding of how the Colorado River Basin’s water is consumed. During our study period of 2000–2019, an estimated average of 23.7 billion cubic meters (19.3 million acre-feet) of water was consumed each year before reaching its now-dry delta in Mexico. Schmidt et al. 2 have estimated that a reduction in consumptive use in the Upper and Lower Basins of 3–4 billion cubic meters (2.4–3.2 million acre-feet) per year—equivalent to 22–29% of direct use in those basins—will be necessary to stabilize reservoir levels, and an additional reduction of 1–3 billion cubic meters (~811,000–2.4 million acre-feet) per year will likely be needed by 2050 as climate warming continues to reduce runoff in the Colorado River Basin.

We hope that this new accounting will add clarity and a useful informational foundation to the public dialog and political negotiations over Colorado River Basin water allocations and cutbacks that are presently underway 2 . Because a persistent drought and intensifying aridification in the region has placed both people and river ecosystems in danger of water shortages in recent decades, knowledge of where the water goes will be essential in the design of policies for bringing the basin into a sustainable water supply-demand balance.

The data sources and analytical approaches used in this study are summarized below. Unless otherwise noted, all data were assembled for each year from 2000–2019 and then averaged. We acknowledge some inconsistency in the manner in which water consumption is measured or estimated across the various data sources and sectors used in this study, as discussed below, and each of these different approaches entail some degree of inaccuracy or uncertainty. We also note that technical measurement or estimation approaches change over time, and new approaches can yield differing results. For instance, the Upper Colorado River Commission is exploring new approaches for estimating crop evapotranspiration in the Upper Basin 22 . When new estimates become available we will update our water budget accordingly.

MCI and agricultural water consumption

The primary source of data on aggregate MCI (municipal, commercial, and industrial) and agricultural water consumption from the Upper and Lower Basins was the US Bureau of Reclamation. Water consumed from the Upper Basin is published in Reclamation’s five-year reports entitled “Colorado River—Upper Basin Consumptive Uses and Losses.” 23 These annual data have been compiled into a single spreadsheet used for this study 24 . Because measurements of agricultural diversions and return flows in the Upper Basin are not sufficiently complete to allow direct calculation of consumptive use, theoretical and indirect methods are used as described in the Consumptive Uses and Losses reports 25 . Reclamation performs these estimates for Colorado, Wyoming, and Utah, but the State of New Mexico provides its own estimates that are collaboratively reviewed with Reclamation staff. The consumptive use of water in thermoelectric power generation in the Upper Basin is provided to Reclamation by the power companies managing each generation facility. Reclamation derives estimates of consumptive use for municipal and industrial purposes from the US Geological Survey’s reporting series (published every 5 years) titled “Estimated Use of Water in the United States” at an 8-digit watershed scale 26 .

Use of shallow alluvial groundwater is included in the water accounting compiled by Reclamation but use of deeper groundwater sources—such as in Mexico and the Gila River Basin—is explicitly excluded in their accounting, and in ours. Reclamation staff involved with water accounting for the Upper and Lower Basins assume that groundwater use counted in their data reports is sourced from aquifers that are hydraulically connected to rivers and streams in the CRB (James Prairie, US Bureau of Reclamation, personal communication, 2023); because of this high connectivity, much of the groundwater being consumed is likely being sourced from river capture as discussed in Jasechko et al. 27 and Wiele et al. 28 and is soon recharged during higher river flows.

Water consumed from the Lower Basin (excluding water supplied by the Gila River Basin) is published in Reclamation’s annual reports entitled “Colorado River Accounting and Water Use Report: Arizona, California, and Nevada.” 3 These consumptive use data are based on measured deliveries and return flows for each individual water user. These data are either measured by Reclamation or provided to the agency by individual water users, tribes, states, and federal agencies 29 . When not explicitly stated in Reclamation reports, attribution of water volumes to MCI or agricultural uses was based on information obtained from each water user’s website, information provided directly by the water user, or information on export water use provided in Siddik et al. 30 . Water use by entities using less than 1.23 million cubic meters (1000 acre-feet) per year on average was allocated to MCI and agricultural uses according to the overall MCI-agricultural percentages calculated within each sub-basin indicated in Tables  1 and 2 for users of greater than 1.23 million cubic meters/year.

Disaggregation of water consumption by sector was particularly important and challenging for the Central Arizona Project given that this canal accounts for 21% of all direct water consumption in the Lower Basin. Reclamation accounts for the volumes of annual diversions into the Central Arizona Project canal but the structure serves 1071 water delivery subcontracts. We classified every unique Central Arizona Project subcontract delivery between 2000–2019 by its final water use to derive an estimated split between agricultural and MCI uses. Central Arizona Project subcontract delivery data were obtained from the current and archived versions of the project’s website summaries in addition to being directly obtained from the agency through a public information request. Subcontract deliveries were classified based on the final end use, including long-term and temporary leases of project water. This accounting also includes the storage of water in groundwater basins for later MCI or agricultural use. Additionally, water allocated to Native American agricultural uses that was subsequently leased to cities was classified as an MCI use.

Data for the Gila River basin was obtained from two sources. The Arizona Department of Water Resources has published data for surface water use in five “Active Management Areas” (AMAs) located in the Gila River basin: Prescott AMA, Phoenix AMA, Pinal AMA, Tucson AMA, and Santa Cruz AMA 31 . The water-use data for these AMAs is compiled from annual reports submitted by each water user (contractor) and then reviewed by the Arizona Department of Water Resources. The AMA water-use data are categorized by purpose of use, facilitating our separation into MCI and agricultural uses. These data are additionally categorized by water source; only surface water sourced from the Gila River hydrologic system was counted (deep groundwater use was not). The AMA data were supplemented with data for the upper Gila River basin provided by the University of Arizona 32 . We have assumed that all water supplied by the Gila River Basin is fully consumed, as the river is almost always completely dry in its lower reaches (less than 1% flows out of the basin into the Colorado River, on average 33 ).

Data for Mexico were obtained from Hernandez-Cruz et al. 10 based on estimates for 2008–2015. Agricultural demands were estimated from annual reports of irrigated area and water use published by the Ministry of Agriculture and the evapotranspiration estimates of the principal crops published by the National Institute for Forestry, Animal Husbandry, and Agricultural Research of Mexico 10 . The average annual volume of Colorado River water consumption in Mexico estimated by these researchers is within 1% of the cross-border delivery volume estimated by the Bureau of Reclamation for 2000–2019 in its Colorado River Accounting and Water Use Reports 3 .

Exported water consumption

Annual average inter-basin transfer volumes for each of 46 canals and pipelines exporting water outside of the Upper Basin were obtained from Reclamation’s Consumptive Uses and Losses spreadsheet 34 . Data for the Colorado River Aqueduct in the Lower Basin were obtained from Siddik et al. 30 Data for exported water in Mexico was available from Hernandez-Cruz et al. 10 . We assigned any seepage or evaporation losses from inter-basin transfers to their proportional end uses. All uses of exported water are considered to be consumptive uses with respect to the Colorado River, because none of the water exported out of the basin is returned to the Colorado River Basin.

We relied on data from Siddik et al. (2023) to identify whether the water exported out of the Colorado River Basin was for only MCI or agricultural use. When more than one water use purpose was identified, as well as for all major inter-basin transfers, we used government and inter-basin transfer project websites or information obtained directly from the project operator or water manager to determine the volume of water transferred and the end uses. Major recipients of exported water include the Coachella Valley Water District (California); Metropolitan Water District of Southern California (particularly for San Diego County, California); Northern Colorado Water Conservancy District; City of Denver (Colorado); the Central Utah Project; City of Albuquerque (New Mexico); and the Middle Rio Grande Conservancy District (New Mexico). We did not pursue sectoral water-use information for 17 of the 46 Upper Basin inter-basin transfers due to their relatively low volumes of water transferred by each system (<247,000 cubic meters or 2000 acre-feet), and instead assigned the average MCI or agricultural percentage (72% MCI, 28% agricultural) from all other inter-basin transfers in the Upper Basin. The export volume of these 17 inter-basin transfers sums to 9.76 million cubic meters (7910 acre-feet) per year, equivalent to 1% of the total volume exported from the Upper Basin.

Reservoir evaporation

Evaporation estimates for the Upper Basin and Lower Basin are based upon Reclamation’s HydroData repository 35 . Reclamation’s evaporation estimates are based on the standardized Penman-Monteith equation as described in the “Lower Colorado River Annual Summaries of Evapotranspiration and Evaporation” reports 17 . The Penman-Monteith estimates are based on pan evaporation measurements. Evaporation estimates for the Salt River Project reservoirs in the Gila River basin were provided by the Salt River Project in Arizona (Charlie Ester, personal communication, 2023).

Another consideration with reservoirs is the volume of water that seeps into the banks or sediments surrounding the reservoir when reservoir levels are high, but then drains back into the reservoir as water levels decline 36 . This has the effect of either exacerbating reservoir losses (consumptive use) or offsetting evaporation when bank seepage flows back into a reservoir. The flow of water into and out of reservoir banks is non-trivial; during 1999–2008, an estimated 247 million cubic meters (200,000 acre-feet) of water drained from the canyon walls surrounding Lake Powell into the reservoir each year, providing additional water supply 36 . However, the annual rate of alternating gains or losses has not been sufficiently measured at any of the basin’s reservoirs and therefore is not included in Tables  1 and 2 .

Riparian and wetland vegetation evapotranspiration

We exported the total annual evapotranspiration depth at a 30 meter resolution from OpenET 37 using Google Earth Engine from 2016 to 2019 to align with OpenET’s data availability starting in 2016. Total annual precipitation depths, sourced from gridMET 38 , were resampled to align with the evapotranspiration raster resolution. Subsequently, a conservative estimate of the annual water depth utilized by riparian vegetation from the river was derived by subtracting the annual precipitation raster from the evapotranspiration raster for each year. Positive differentials, indicative of river-derived evapotranspiration, were then multiplied by the riparian vegetation area as identified in the CO-RIP 16 dataset to estimate the total annual volumetric water consumption by riparian vegetation across the Upper, Lower, and Gila River Basins. The annual volumetric water consumption calculated over four years were finally averaged to get riparian vegetation evapotranspiration in the three basins. Because the entire flow of the Colorado River is diverted into the Canal Alimentador Central near the international border, very little riparian evapotranspiration occurs along the river south of the international border in the Mexico basin.

In addition to water consumed by riparian evapotranspiration within the Lower Basin, the Salton Sea receives agricultural drain water from both the Imperial Irrigation District and the Coachella Valley Irrigation District, stormwater drainage from the Coachella Valley, and inflows from the New and Alamo Rivers 39 . Combined inflows to the Sea during 2015–2019 were added to our estimates of riparian/wetland evapotranspiration in the Lower Basin.

Similarly, Mexico receives drainage water from the Wellton–Mohawk bypass drain originating in southern Arizona that empties into the Cienega de Santa Clara (a wetland); this drainage water is included as riparian/wetland evapotranspiration in the Mexico basin.

Crop-specific water consumption

The volumes of total agricultural consumption reported for each sub-basin in Tables  1 and 2 were obtained from the same data sources described above for MCI consumption and exported water. The portion (%) of those agricultural consumption volumes going to each individual crop was then allocated according to percentage estimates of each crop’s water consumption in each accounting unit using methods described in Richter et al. 20 and detailed here.

Monthly crop water requirements during 1981–2019 for 13 individual crops, representing 68.8% of total irrigated area in the US in 2019, were estimated using the AquaCrop-OS model (Table SI- 3 ) 40 . For 17 additional crops representing about 25.4% of the total irrigated area, we used a simple crop growth model following Marston et al. 41 as crop parameters needed to run AquaCrop-OS were not available. A list of the crops included in this study is shown in Table SI- 3 . The crop water requirements used in Richter et al. 11 were based on a simplistic crop growth model, often using seasonal crop coefficients whereas we use AquaCrop-OS 40 , a robust crop growth model, to produce more realistic crop growth and crop water estimates for major crops. AquaCrop-OS is an open-source version of the AquaCrop model 42 , a crop growth model capable of simulating herbaceous crops. Additionally, we leverage detailed local data unique to the US, including planting dates and subcounty irrigated crop areas, to produce estimates at a finer spatial resolution than the previous study. We obtained crop-specific planting dates from USDA 43 progress data at the state level. For crops that did not have USDA crop progress data, we used data from FAO 44 and CUP+ model 45 for planting dates. We used climate data (precipitation, minimum and maximum air temperature, reference ET) from gridMET 38 , soil texture data from ISRIC 46 database and crop parameters from AquaCrop-OS to run the model. The modeled crop water requirement was partitioned into blue and green components following the framework from Hoekestra et al. 47 , assuming that blue and green water consumed on a given day is proportional to the amount of green and blue water soil moisture available on that day. When applying a simple crop growth model, daily gridded (2.5 arc minutes) crop-specific evapotranspiration (ETc) was computed by taking the product of reference evapotranspiration (ETo) and crop coefficient (Kc), where ETo was obtained from gridMET. Crop coefficients were calculated using planting dates and crop coefficient curves from FAO and CUP+ model. Kc was set to zero outside of the growing season. We partitioned the daily ETc into blue and green components by following the methods from ref. 41 It is assumed that the crop water demands are met by irrigation whenever it exceeds effective precipitation (the latter calculated using the USDA Soil Conservation Service method (USDA, 1968 48 ). We obtained county level harvested area from USDA 43 and disaggregated to sub-county level using Cropland Data Layer (CDL) 49 and Landsat-based National Irrigation Dataset (LANID) 50 . The CDL is an annual raster layer that provides crop-specific land cover data, while the LANID provides irrigation status information. The CDL and LANID raster were multiplied and aggregated to 2.5 arc minutes to match the AquaCrop-OS output. We produced a gridded crop area map by using this resulting product as weights to disaggregate county level area. CDL is unavailable before 2008. Therefore, we used land use data from ref. 51 in combination with average CDL map and county level harvested area to produce gridded crop harvested area. We computed volumetric water consumption by multiplying the crop water requirement depth by the corresponding crop harvested area.

Data availability

All data compiled and analyzed in this study are publicly available as cited and linked in our Methods section. Our compilation of these data is also available from Hydroshare at: http://www.hydroshare.org/resource/2098ae29ae704d9aacfd08e030690392 .

Code availability

All model code and software used in this study have been accessed from sources cited in our Methods section. We used AquaCrop-OS (v5.0a), an open source version of AquaCrop crop growth model, to run crop simulations. This model is publicly available at http://www.aquacropos.com/ . For estimating riparian evapotranspiration, we used ArcGIS Pro 3.1.3 on the Google Earth Engine. Riparian vegetation distribution maps were sourced from Dryad at https://doi.org/10.5061/dryad.3g55sv8 .

Stromberg, J. C., Andersen, D. C. & Scott, M. L. Riparian floodplain wetlands of the arid and semiarid southwest In Wetland Habitats of North America: Ecology and Conservation Concern s , Chapter 24, pp. 343–356. (University of California Press, 2012). https://www.ucpress.edu/book/9780520271647/wetland-habitats-of-north-america .

Schmidt, J. C., Yackulic, C. B. & Kuhn, E. The Colorado River water crisis: Its origin and the future. WIREs Water https://doi.org/10.1002/wat2.1672 (2023).

Article   Google Scholar  

Colorado River Accounting and Water Use Report: Arizona, California, and Nevada. Interior Region 8: Lower Colorado Basin (US Bureau of Reclamation, 2023). Annual reports available under “Water Accounting Reports” at https://www.usbr.gov/lc/region/g4000/wtracct.html .

Water Operations: Historic Data (US Bureau of Reclamation, 2023). https://www.usbr.gov/rsvrWater/HistoricalApp.html .

Colorado River Compact , 1922 . US Bureau of Reclamation. https://www.usbr.gov/lc/region/pao/pdfiles/crcompct.pdf .

Kuhn, E. & Fleck, J. Science Be Dammed:How Ignoring Inconvenient Science Drained the Colorado River (The University of Arizona Press, 2019) https://uapress.arizona.edu/book/science-be-dammed .

Castle, A. & Fleck, J. The Risk of Curtailment under the Colorado River Compact ( https://doi.org/10.2139/ssrn.3483654 (2019).

US Bureau of Reclamation. Colorado River Basin Water Supply and Demand Study: Technical Report C – Water Demand Assessment https://www.usbr.gov/lc/region/programs/crbstudy/finalreport/Technical%20Report%20C%20-%20Water%20Demand%20Assessment/TR-C-Water_Demand_Assessmemt_FINAL.pdf (2012).

Utilization of the Waters of the Colorado and Tijuana Rivers and of the Rio Grande . International Treaty between the United States and Mexico, February 3, 1944. (International Boundary and Waters Commission, 1944). https://www.ibwc.gov/wp-content/uploads/2022/11/1944Treaty.pdf .

Hernández-Cruz, A. et al. Assessing water management strategies under water scarcity in the Mexican portion of the Colorado River Basin. J. Water Resour. Plan. Manag. 149 , 04023042 (2023).

Richter, B. D. et al. Water scarcity and fish imperilment driven by beef production. Nat. Sustain. 3 , 319–328 (2020).

Biden-Harris Administration announces historic Consensus System Conservation Proposal to protect the Colorado River Basin . US Department of the Interior, May 22, 2023. https://www.doi.gov/pressreleases/biden-harris-administration-announces-historic-consensus-system-conservation-proposal .

Wheeler, K. G. et al. What will it take to stabilize the Colorado River? Science 377 , 373–375 (2022).

Article   ADS   CAS   PubMed   Google Scholar  

Richter, B. D. Decoupling urban water use from population growth in the Colorado River Basin. J. Water Plan. Manag. 149 , 2 (2023).

Google Scholar  

Schmidt, J. C. Maps Matter: A few suggested changes to the Colorado River basin base map . Center for Colorado River Studies. (Utah State University, 2022).

Woodward, B. D. et al. Co-Rip: A riparian vegetation and corridor extent dataset for Colorado river basin streams and rivers. ISPRS Int. J. Geo Inform. 7 , 397 (2018).

Article   ADS   Google Scholar  

Lower Colorado River Annual Summaries of Evapotranspiration and Evaporation . (US Bureau of Reclamation, Lower Colorado Region, 2023). https://www.usbr.gov/lc/region/g4000/wtracct.html .

Richter, B. D., Powell, E. M., Lystash, T. & Faggert, M. Protection and restoration of freshwater ecosystems. Chapter 5 in Miller, Kathleen A., Alan F. Hamlet, Douglas S. Kenney, and Kelly T. Redmond (Eds.) Water Policy and Planning in a Variable and Changing Climate . (CRC Press - Taylor & Francis Group, 2016).

Shao, Elena. “The Colorado River is shrinking. See what’s using all the water.” New York Times , May 22, 2023. https://www.nytimes.com/interactive/2023/05/22/climate/colorado-river-water.html .

Richter, B. D., et al. Alleviating water scarcity by optimizing crop mixes. Nat. Water . https://doi.org/10.1038/s44221-023-00155-9 .

Njuki, E. U.S. dairy productivity increased faster in large farms and across southwestern states . U.S. Economic Research Service, US Department of Agriculture, March 22, 2022. https://www.ers.usda.gov/amber-waves/2022/march/u-s-dairy-productivity-increased-faster-in-large-farms-and-across-southwestern-states/ .

Mefford, B. & Prairie J., eds. Assessing Agricultural Consumptive Use in the Upper Colorado River Basin - Phase III Report U.S. Bureau of Reclamation and the Upper Colorado River Commission. http://www.ucrcommission.com/reports-studies/ (2022).

Upper Basin Consumptive Uses and Losses (Bureau of Reclamation). Annual reports available at https://www.usbr.gov/uc/envdocs/plans.html .

Bureau of Reclamation. “Consumptive Uses and Losses spreadsheet 1971–2020” Colorado River Basin Natural Flow and Salt Data, Supporting data for consumptive uses and losses computation. https://www.usbr.gov/lc/region/g4000/NaturalFlow/documentation.html .

Upper Colorado River Basin Consumptive Uses and Losses Report 2016–2020 . US Department of Interior: Bureau of Reclamation. Five year reports available under “Colorado River-Consumptive Uses and Losses Reports” at https://www.usbr.gov/uc/envdocs/plans.html .

Estimated Use of Water in the United States . US Department of Interior: US Geological Survey. Reports available every five years at https://www.usgs.gov/mission-areas/water-resources/science/water-use-united-states .

Jasechko, S. et al. Widespread potential loss of streamflow into underlying aquifers across the USA. Nature 591 , 391–395 (2021).

Wiele, S. M., Leake, S. A., Owen-Joyce, S. J. & and McGuire, E. H. Update of the Accounting Surface Along the Lower Colorado River US Department of the Interior: US Geological Survey Scientific Investigations Report 2008–5113 (2008).

Bruce, B. W., et al. Comparison of U.S. Geological Survey and Bureau of Reclamation water-use reporting in the Colorado River Basin U.S. Geological Survey Scientific Investigations Report 2018–5021 . https://doi.org/10.3133/sir20185021 (2018).

Siddik, M. A. B., Dickson, K. E., Rising, J., Ruddell, B. L. & Marston, L. T. Interbasin water transfers in the United States and Canada. Sci. Data 10 , 27 (2023). Data spreadsheet provided by M.A.B. Siddik.

Article   PubMed   PubMed Central   Google Scholar  

Active Management Areas : AMA Annual Supply and Demand Dashboard (Arizona Department of Water Resources, 2023). https://azwater.gov/ama/ama-data .

Lacroix, K. M. et al. Wet water and paper water in the Upper Gila River Watershed https://extension.arizona.edu/sites/extension.arizona.edu/files/pubs/az1708-2016_0.pdf The University of Arizona Cooperative Extension, AZ1708. Data spreadsheet provided by A. Hullinger (2016).

Surface-Water Annual Statistics for the Nation: Gila River at Dome, Arizona . US Geological Survey. Available at https://waterdata.usgs.gov/nwis/annual/?referred_module=sw&site_no=09520500&por_09520500_5810=19975,00060,5810,1905,2024&year_type=C&format=html_table&date_format=YYYY-MM-DD&rdb_compression=file&submitted_form=parameter_selection_list .

Consumptive Uses and Losses spreadsheet 1971–2020 . Bureau of Reclamation, Colorado River Basin Natural Flow and Salt Data, Supporting data for consumptive uses and losses computation. https://www.usbr.gov/lc/region/g4000/NaturalFlow/documentation.html .

HydroData: Reservoir Data . US Bureau of Reclamation. https://www.usbr.gov/uc/water/ .

Myers, T. Loss rates from Lake Powell and their impact on management of the Colorado River. J. Am. Water Resour. Assoc. 49 , 1213–1224 (2013).

Melton, F. S. et al. OpenET: filling a critical data gap in water management for the western United States. J. Am. Water Resour. Assoc. 58 , 971–994 (2022).

Abatzoglou, J. T. Development of gridded surface meteorological data for ecological applications and modelling. Int. J. Climatol. 33 , 121–131 (2013).

Salton Sea Management Program: Long-Range Plan Public Draft (2022). California Natural Resources Agency. https://saltonsea.ca.gov/wp-content/uploads/2022/12/Salton-Sea-Long-Range-Plan-Public-Draft-Dec-2022.pdf .

Foster, T. et al. AquaCrop-OS: an open source version of FAO’s crop water productivity model. Agricul. Water Manag. 181 , 18–22 (2017).

Marston, L. T., et al. Reducing water scarcity by improving water productivity in the United States. Environ. Res. Lett. 15 https://doi.org/10.1088/1748-9326/ab9d39 (2020).

Steduto, P., Hsiao, T. C., Fereres, E. & Raes, D. Crop yield response to water (2012). 1028. Rome: Food and Agriculture Organization of the United Nations.

USDA, National Agricultural Statistics Service. “Quick Stats.” http://quickstats.nass.usda.gov .

Allen, R. G., Pereira, L. S., Raes, D. & Smith, M. FAO Irrigation and drainage paper No. 56 56, (e156. Food and Agriculture Organization of the United Nations, Rome, 1998).

Orange, M. N., Scott Matyac, J. & Snyder, R. L. Consumptive use program (CUP) model. IV Int. Symp. Irrig. Horticult. Crops 664 , 461–468 (2003).

Hengl, T. et al. SoilGrids250m: Global gridded soil information based on machine learning. PLoS One 12 , e0169748 (2017).

Hoekstra, A. Y. Green-blue water accounting in a soil water balance. Adv. Water Resour. 129 , 112–117 (2019).

USDA (US Department of Agriculture). A Method for Estimating Volume and Rate of Runoff in Small Watersheds . SCS-TP-149. Washington DC: Soil Conservation Service (1968).

Johnson, D. M., & Mueller, R. 2010. “Cropland Data Layer.” https://nassgeodata.gmu.edu/CropScape/ .

Xie, Y., Gibbs, H. K. & Lark, T. J. Landsat-based Irrigation Dataset (LANID): 30m resolution maps of irrigation distribution, frequency, and change for the US, 1997–2017. Earth Syst. Sci. Data 13 , 5689–5710 (2021).

Sohl, T. et al. Modeled historical land use and land cover for the conterminous United States. J. Land Use Sci. 11 , 476–499 (2016).

Download references

Acknowledgements

This paper is dedicated to our colleague Jack Schmidt in recognition of his retirement and enormous contributions to the science and management of the Colorado River. The authors thank James Prairie of the US Bureau of Reclamation, Luke Shawcross of the Northern Colorado Water Conservancy District, Charlie Ester of the Salt River Project, and Brian Woodward of the University of California Cooperative Extension for their assistance in accessing data used in this study. The authors also thank Rhett Larson at the Sandra Day O’Connor School of Law at Arizona State University for their review of Arizona water budget data, and the Central Arizona Project for providing delivery data by each subcontract. G.L., L.M., and K.F.D. acknowledge support by the United States Department of Agriculture National Institute of Food and Agriculture grant 2022-67019-37180. L.T.M. acknowledges the support the National Science Foundation grant CBET-2144169 and the Foundation for Food and Agriculture Research Grant No. FF-NIA19-0000000084. R.R.R. acknowledges the support the National Science Foundation grant CBET-2115169.

Author information

Authors and affiliations.

World Wildlife Fund, 1250 24th St NW, Washington, DC, 20037, USA

Brian D. Richter

Sustainable Waters, Crozet, Virginia, 22932, USA

The Charles E.Via, Jr, Department of Civil and Environmental Engineering, Virginia Tech, Blacksburg, VA, 24061, USA

Gambhir Lamsal, Landon Marston & Sameer Dhakal

Northern Arizona University, Flagstaff, AZ, 86011, USA

Laljeet Singh Sangha, Richard R. Rushforth & Benjamin L. Ruddell

Department of Geography and Spatial Sciences, University of Delaware, Newark, DE, 19716, USA

Dongyang Wei & Kyle Frankel Davis

Department of Plant and Soil Sciences, University of Delaware, Newark, DE, 19716, USA

Kyle Frankel Davis

Instituto de Investigaciones Oceanologicas, Universidad Autonoma de Baja California, Ensenada, Baja California, México

Astrid Hernandez-Cruz

Department of Land, Air and Water Resources, Univeristy of California at Davis, Davis, CA, 95616, USA

Samuel Sandoval-Solis

Center for Colorado River Studies, Utah State University, Logan, UT, 84322, USA

John C. Schmidt

You can also search for this author in PubMed   Google Scholar

Contributions

B.D.R. designed the study, compiled and analyzed data, wrote the manuscript and supervised co-author contributions. G.L. compiled all crop data, estimated crop evapotranspiration, and prepared figures. S.D. compiled all riparian vegetation data and estimated riparian evapotranspiration. L.S.S. and R.R.R. accessed, compiled, and analyzed data from the Central Arizona Project. D.W. compiled data and prepared figures. A.H.-C. and S.S.-S. compiled and analyzed data for Mexico. J.C.S. compiled and analyzed reservoir evaporation data and edited the manuscript. L.M., B.L.R., and K.F.D. supervised data compilation and analysis and edited the manuscript.

Corresponding author

Correspondence to Brian D. Richter .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Peer review

Peer review information.

Communications Earth & Environment thanks James Booker and Becky Bolinger for their contribution to the peer review of this work. Primary Handling Editors: Aliénor Lavergne and Carolina Ortiz Guerrero. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer review file, supplementary information, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Richter, B.D., Lamsal, G., Marston, L. et al. New water accounting reveals why the Colorado River no longer reaches the sea. Commun Earth Environ 5 , 134 (2024). https://doi.org/10.1038/s43247-024-01291-0

Download citation

Received : 03 October 2023

Accepted : 27 February 2024

Published : 28 March 2024

DOI : https://doi.org/10.1038/s43247-024-01291-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

recent research paper on data warehouse

IMAGES

  1. (PDF) Research in data warehouse modeling and design: Dead or alive?

    recent research paper on data warehouse

  2. (PDF) The impact of indexes on data warehouse performance

    recent research paper on data warehouse

  3. Data Warehouse

    recent research paper on data warehouse

  4. (PDF) Recent Research Papers

    recent research paper on data warehouse

  5. (PDF) Cloud Computing and Data Warehousing

    recent research paper on data warehouse

  6. Design and Research on Data Warehouse of Insurance Industry

    recent research paper on data warehouse

VIDEO

  1. Data Warehouse & Report tutorial using Power BI

  2. Data mining and warehouse Paper Questions Rgpv Exam

  3. lecture3-P2 || Data Warehouse || Dimensional modeling

  4. Skeleton-of-Thought: Building a New Template from Scratch

  5. Challenges and Opportunities for Educational Data Mining ! Research Paper review

  6. Data Mining : Data Warehousing and Online Analytical Processing ch4

COMMENTS

  1. data warehousing Latest Research Papers

    Abstract Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed ...

  2. PDF Lakehouse: A New Generation of Open Platforms that Unify Data

    Data Systems Research (CIDR '21), January 11-15, 2021, Online. data at low cost, but on the other hand, punted the problem of data quality and governance downstream. In this architecture, a small subset of data in the lake would later be ETLed to a downstream data warehouse (such as Teradata) for the most important decision

  3. (PDF) TRENDS IN DATA WAREHOUSING TECHNIQUES

    The Big Data Warehouse (BDW) is a scalable, high-. performance system that uses Big Data techniques a nd technologies to support mixed and complex analytical. workloads (e.g., streaming analysis ...

  4. PDF The Data Lakehouse: Data Warehousing and More

    This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advan-tages. We take today's data warehousing and break it down into implementation-independent components, capabilities, and prac-tices.

  5. An Overview of Data Warehouse and Data Lake in Modern Enterprise Data

    Data is the lifeblood of any organization. In today's world, organizations recognize the vital role of data in modern business intelligence systems for making meaningful decisions and staying competitive in the field. Efficient and optimal data analytics provides a competitive edge to its performance and services. Major organizations generate, collect and process vast amounts of data ...

  6. Data Warehouse with Big Data Technology for Higher Education

    It is possible to implement data warehouse for typical university information system [8]. Academic data warehouse supports the decisional and analytical activities regarding the three major components in the university context: didactics, research, and management [9]. Data warehouse has important role in educational data analysis [10]. Table 1.

  7. Big Data Warehouse for Healthcare-Sensitive Data Applications

    Recent research studies in the healthcare sector have focused on developing GDPR compliant protocols in the form of a consent management system for data collection. ... BigO data warehouse architecture uses secured views for the data scientists to mine the datasets. ... This paper first implemented data access and storage components of the BigO ...

  8. Comprehensive survey on data warehousing research

    Various issues and challenges in the field of data warehousing are presented in many studies during the recent years. In this paper, a comprehensive survey is presented to take a holistic view of the research trends in the fields of data warehousing. This paper presents a systematic division of work of researchers in the fields of data warehousing.

  9. Recent Advances and Research Problems in Data Warehousing

    Current research has lead to new developments in all aspects of data warehousing, however, there are still a number of problems that need to be solved for making data warehousing effective. In this paper, we discuss recent developments in data warehouse modelling, view maintenance, and parallel query processing.

  10. [2310.08697] The Data Lakehouse: Data Warehousing and More

    This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advantages. We take today's data warehousing and break it down into implementation independent components, capabilities, and practices. ...

  11. Data

    The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such ...

  12. 54619 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DATA WAREHOUSING. Find methods information, sources, references or conduct a literature review on ...

  13. (PDF) Data Warehouse Concept and Its Usage

    Abstract. A data warehouse is a r epository for all data which is collected by an organization in various operational systems; it can. be either physical or l ogical. It is a subject oriented ...

  14. A Data Warehouse Approach for Business Intelligence

    Abstract: In a cloud based data warehouse (DW), business users can access and query data from multiple sources and geographically distributed places. Business analysts and decision makers are counting on DWs especially for data analysis and reporting. Temporal and spatial data are two factors that affect seriously decision-making and marketing strategies and many applications require modelling ...

  15. Research of Data Warehouse for Science and Technology Management System

    A core work of the science and technology management system is to support the integration and utilization of massive data from distributed systems using data warehouse technology. In this paper, we focus on this work. First, we introduce the background of science and technology management by illustrating the scheme of project management business flows. Then, to define the science and ...

  16. [PDF] Data warehousing and OLAP over big data: current challenges and

    In this paper, we highlight open problems and actual research trends in the field of Data Warehousing and OLAP over Big Data, an emerging term in Data Warehousing and OLAP research. We also derive several novel research directions arising in this field, and put emphasis on possible contributions to be achieved by future research efforts.

  17. Clinical research data warehouse governance for distributed research

    Background and significance. An enterprise data warehouse presents opportunities to conduct previously impractical studies of rare exposures or outcomes where very large sample sizes are needed, such as population-based surveillance, treatment safety, or comparative effectiveness research. 1 However, even a large healthcare organization may have insufficient subjects to support such studies.

  18. Data warehouse architecture and design

    A data warehouse is attractive as the main repository of an organization's historical data and is optimized for reporting and analysis. In this paper, we present a data warehouse the process of data warehouse architecture development and design. We highlight the different aspects to be considered in building a data warehouse. These range from data store characteristics to data modeling and ...

  19. [PDF] Research problems in data warehousing

    Recent Advances and Research Problems in Data Warehousing. S. Samtani M. Mohania Vijay Kumar Y. Kambayashi. Business, Computer Science. ER Workshops. 1998. TLDR. This paper discusses recent developments in data warehouse modelling, view maintenance, and parallel query processing, and possible solutions for exploratory research are presented.

  20. Data Warehouse Research Papers

    The data warehouse design task needs to consider both the end-user requirements and the organization data sources. For this reason, the data warehouse design has been traditionally considered a reengineering process, guided by requirements, from the data sources.Most current design methods available demand highly-expressive end-user requirements as input, in order to carry out the exploration ...

  21. Big Data and New Data Warehousing Approaches

    However, the increasing volume of data can cause strain to well-established systems that have been in place for years. Relational data warehouse systems at times cannot cope with Big Data due to ...

  22. PDF The Study on Data Warehouse Design and Usage

    The idea of data warehousing is deceptively very simple. It is very much important to prepare data warehouse by using the proper design methodology and process. This is because data warehousing provides users with large amounts of clean, organized, and summarized data. Which greatly facilitates data mining.

  23. [2403.20329] ReALM: Reference Resolution As Language Modeling

    ReALM: Reference Resolution As Language Modeling. Reference resolution is an important problem, one that is essential to understand and successfully handle context of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user's screen or those running in the ...

  24. Introducing DBRX: A New State-of-the-Art Open LLM

    DBRX advances the state-of-the-art in efficiency among open models thanks to its fine-grained mixture-of-experts (MoE) architecture. Inference is up to 2x faster than LLaMA2-70B, and DBRX is about 40% of the size of Grok-1 in terms of both total and active parameter-counts. When hosted on Mosaic AI Model Serving, DBRX can generate text at up to ...

  25. Apple's latest AI research beats GPT-4 in contextual data parsing

    Apple AI research: ReALM is smaller, faster than GPT-4 when parsing contextual data. Apple AI research reveals a model that will make giving commands to Siri faster and more efficient by ...

  26. Research in data warehouse modeling and design: Dead or alive?

    Though a lot has been written about how a data warehouse should be designed, there is no consensus on a design method yet. This paper follows from a wide discus- sion that took place in Dagstuhl ...

  27. Research on shipping statistics method based on AIS big data mining

    10.1117/12.3019611. Bibcode: 2024SPIE12978E..19Z. In order to solve the problems of traditional shipping statistics, this paper puts forward the method of shipping statistics based on AIS big data, and gives complete technical process and technical scheme including big data platform construction, data access, data cleaning, data warehouse ...

  28. New water accounting reveals why the Colorado River no longer ...

    Here we apply authoritative primary data sources and modeled crop and riparian/wetland evapotranspiration estimates to compile a water budget based on average consumptive water use during 2000-2019.