Considering Types of Data
16 Geospatial Research Data in Canada: An Overview of Regional Projects
Martin Chandler; Kara Handren; Stéfano Biondo; Amber Leahey; Sarah Rutley; and Rhys Stevens
Learning Outcomes
By the end of this chapter you should be able to:
- Understand the current state of geospatial research data infrastructure and services across different regions of Canada.
- Explain some unique considerations for managing geospatial data.
- Provide examples and exemplars of geospatial data management.
- Recognize the future of geospatial Research Data Management in Canada.
Introduction
Libraries in Canada support a variety of services for the discovery, access, and preservation of geospatial research data. Infrastructure and services have been developed regionally, primarily at academic institutions, to support the management of geospatial data collections and resources. This has created a patchwork of research data services across the country. This chapter will provide an overview of the approaches and key infrastructure projects for the management of geospatial research data in Canada.
Spatial/geospatial (hereafter referred to as “geospatial”) data have not always been recognized as requiring special consideration when it comes to Research Data Management (RDM). However, due to unique aspects of their creation, use, and access, geospatial data require particular consideration of their management separate from other areas of RDM.
Generally, responsibility for the curation of geospatial data has fallen to geospatial/data librarians or data managers with subject expertise, as these two groups are best equipped to meet the challenges that geospatial data provide. This chapter seeks to clarify the challenges particular to geospatial RDM; the various regional projects currently underway or in development to help meet challenges of preservation and access to geospatial research data; and the future directions for geospatial-centric RDM in Canada.
Geospatial Data and GIS
What is geospatial data? And how is geospatial research data distinguished from research data as a whole? Any data about objects or events that have a location are geospatial data. This includes instances where the location is static (in one defined location over a short term, such as a building or an earthquake) or dynamic (displaying change or movement over a short term, such as urban growth or the effects of drought on neighbouring water tables). Geospatial data combine location information with characteristics of an object, event, or concept (“attribute” data), and often (though not always) temporal information (Stock & Guesgen, 2016).
Geospatial data often rely on the use of a geographic information system (GIS), such as QGIS, ArcGIS, or Google Earth. This system allows numerous means and methods to develop, use, and export geospatial data, including creating and sharing datasets. Geospatial research data often combine or join spatial data points (or features) with other source data and variables to support data use. These variables often include data that are geographic in nature, such as census data at the census tract or postal code level.
Considerations for geospatial RDM heavily rely upon exporting data to various formats (format interoperability), previewing and reusing static maps, using or reusing statistical and geographic data, reusing interactive data applications, and using map-based features and components.
Due to the nature of geospatial RDM, its use in GIS, and how data is handled therein, some understanding of data management is often a prerequisite. Introductions to geospatial data use are available in Anita Graser’s “Learn QGIS” or Esri Press’s “Getting to know ArcGIS…” or “GIS Tutorial for…” series. While you can create data in a GIS, it is more often used to provide the tools for joining geospatial and other forms of data (e.g., tabular data with pre-existing geospatial data).
More general RDM topics, including file management, are dealt with in other chapters of this textbook. Furthermore, the creation of geospatial data and the management of geospatial research data are highlighted by the projects described below. This chapter will focus on various regional projects undertaken or currently under way across Canadian academic libraries to manage and preserve geospatial research data in Canada. Highlights include projects that emphasize making geospatial research data discoverable, publicly accessible, and reusable for a broad variety of audiences and users.
Management and reuse of geospatial research data requires reflecting on the physical space(s) from which the data were collected or to which they refer. There has been a move toward geospatial discovery that integrates base maps with text-based search. This can often include a geographic display and preview of datasets (see, for example, OCUL Scholars GeoPortal or Land Information Ontario’s Geohub). The data is then either displayed in a reduced format directly over the base map or reflected as a bounding box showing the geographic extent of the data available. It is especially important to note that geospatial research data management requires more robust infrastructure to support it, which is highlighted in some of the regional work described in this chapter. This infrastructure generally costs more, so the management of geospatial data for long-term storage and discovery tends to be a consortial, rather than an individual, project.
Forms of Geospatial Data
While many data forms can include geospatial elements (e.g., a variable for city, census division, address), geospatial data also include distinct formats in the form of raster and vector data. Raster data consist of a matrix of cells organized into rows and columns, with each cell containing information and often represented visually. For example, a scanned map or drawing is raster data, as is satellite imagery (Esri, 2016).
Vector data is a representation of real-world features or phenomena in a GIS, with underlying data to allow for connections between the feature(s) and other forms of data. Vector data can be divided into point, line, and polygon data. Point data are single vertices or locations in space (e.g., the location of a tree); line data, or polyline data, are two or more vertices where the first and last are not equal, showing a line or series of lines (e.g., a road); and polygon data are three or more vertices where the last vertex is equal to the first, forming a closed shape (e.g., the boundary of a property, area, or province) (QGIS, n.d.).
Tabular geospatial data commonly exist in a table or comma-separated values (CSV) format. This can be as simple as an address or geographic place name (e.g., “Unama’ki”), or as fulsome as a set of points, extent, spatial identifier, and hierarchy of geographic names and identifiers.
Geospatial Data as Interaction
Because geospatial data require a reflection of space, they are rarely created as single, discrete datasets. They instead rely on interactions with other spatial datasets, including underlying spatial data to locate them within a GIS and/or involving the development of further spatial data to initiate, further, or conclude the analysis of the data in question. RDM can require planning for abstract data interactions. But geospatial RDM requires careful consideration and planning for interactions between both abstract and physical data, the different modes and methods these interactions may involve, and how the interactions will be handled by the software used for analysis. Using GIS itself involves the careful planning of data management, as the software mounts data from its digital location rather than copying it into the software. As such, when saving a project in GIS, the locations of data are saved, and moving a dataset means the project itself may become unworkable, unless the user corrects the location of the dataset.
It should be noted that geospatial data creation is both an end in itself and a development for a further end that includes analysis, visualization, or pre-analysis project conception. Geospatial data can be created to serve as a research output itself or as an aid used to prepare, analyze, or visualize another data source. It is then both an end and an intermediary; in other words, it serves as the outcome of research (as any other dataset does), as a tool for analysis (like SPSS, NVivo, Voyant, etc.), and as a tool for data presentation (like Tableau, ggplot, etc). What’s more, while a numeric dataset can be presented as a single file for use, a geospatial dataset requires supporting geospatial data, map projections (i.e., the many and varied means of reflecting a three-dimensional globe in a two-dimensional display) and coordinate reference systems (i.e., the differing systems that dictate where and how a set of geospatial data should display on a map).
Data as Object vs Data as Process
Finally, due to the connected/interactive nature of geospatial data in research, geospatial RDM must be considered both in terms of data produced by research and data used in the process of research. For example, a researcher may require a portion of a census boundary file from Statistics Canada. Therefore, in their analysis, they may extract a portion of the boundary file. By doing so, that data becomes a research product, much as an extraction of census data would become research data. The extracted boundary file may only be a preliminary step prior to analysis and may itself be altered using different coordinate reference systems and/or projections (e.g., a Lambert Conformal projection changed to a Web Mercator projection). The line between data prepared for research use and data created as a result of research use is then more nebulous for geospatial data. As such, at least for the sake of this chapter, geospatial RDM will include managing some prepared datasets as well as data resulting from research. The rest of this chapter will highlight projects in various regions of Canada that serve any of three purposes:
- Currently assisting in geospatial RDM
- Will be assisting in geospatial research data management in the future
- Outlining the difficulties of assisting in geospatial RDM
Regional Geospatial Projects
As previously noted, the needs of geospatial data management are such that solutions for access and preservation are best sought consortially rather than through individual institutions. The various ways that regional consortia are working on geospatial research data management solutions are highlighted below.
Atlantic Canada
As of 2023, there is no shared or consortial method of data storage and delivery in Atlantic Canada, despite Nova Scotia’s status as a forerunner in shared library systems (Marshall, 1999, p. 134). However, data librarians in academic institutions have discussed this topic and identified a need. Further to this, discussions have begun with other consortial systems, particularly Scholars GeoPortal in Ontario. There is some optimism for a national system, run either as a shared consortial system or through the Digital Research Alliance of Canada’s university library associates. These discussions remain preliminary and informal. However, it is worth recording that they are occurring (DLI-Atlantic, Personal Communications, Feb–Mar 2022).
As there are no provincial or regional shared systems, geospatial RDM implementation is entirely up to local institutions, in such instances where geospatial research data has been recognized. Each institution has taken its own approach to research data management, dictated primarily by institutional and librarian capacity for program development. So each institution has also taken its own approach to geospatial RDM. Often, especially in smaller institutions, this can mean handling questions on an as-needed basis (i.e., if a librarian is approached by a faculty member, they will seek a suitable outcome if and where feasible, often either as a Dataverse deposit or as a locally housed dataset). While such an ad hoc system is not ideal for the storage, use, and discovery of geospatial research data, it remains the best possibility under limited resources (DLI-Atlantic, Personal Communications, Feb–Mar 2022).
Many institutions have opted to use Borealis (formerly Scholar’s Portal Dataverse) instances for the institutional data repository to house any researcher-created data. (See Lunaris, n.d., for a listing of institutions’ data repositories and the platforms for hosting. Lunaris (formerly the discovery service for FRDR) is separate from Borealis but draws on those institutional repositories to offer a tool for accessing data and navigating local repositories.) Dataverse instances offer improved discovery; however, Dataverse lacks a robust geospatial display tool or discovery platform. This was partially mediated by the Geodisy tool (see ubc-library (2022) and other references in this chapter) but has been replaced by Lunaris. These systems do not display data or allow clipping to particular areas; they only allow a basic display of data coverage. There is a large gap for geospatial data searching and reuse and for geospatial research data storage and service.
Dalhousie University’s GIS Centre is the most developed system in the Atlantic region. A portal, built on Esri’s ArcGIS Hub, is being developed for access to all datasets held or licensed by the university. This allows for geospatial searching and preview methods, as well as preliminary clipping prior to download. However, because it houses licensed datasets, it is restricted solely to Dalhousie users and is not available to other institutional users. External seekers of data remain frustrated.
Québec
In Québec, each university library managed and disseminated geospatial data independently, in a somewhat automated manner, until 2019. In 2015, a historic agreement between the Bureau de coopération interuniversitaire (BCI) and the Ministère de l’Énergie et des Ressources naturelles (MERN) encouraged a new way of managing and disseminating geospatial data within the Québec university network.
Until 2015, all Québec universities had to purchase government data individually and could not share it amongst each other due to licensing agreements. Overnight, thanks to the BCI-MERN agreement, universities could use and share more than 250 layers representing 50 terabytes (TB). But how could this amount of data be managed and shared? Not all universities have an adequate platform to organize and disseminate this geospatial data for the benefit of teaching and research.
To encourage inter-university collaboration and the pooling of processes and resources, the library at l’Université Laval agreed to share its geospatial expertise and know-how by creating a shared platform managed by l’Université Laval and accessible to participating libraries. Their solution would integrate all the functionalities required to discover, visualize, and extract geospatial data and load it in a secure and efficient environment.
The result was Géoindex, a unique infrastructure accessible to 18 Québec universities via 18 entry points configured by each institution according to their preferences. Thanks to its powerful spatial and textual search engine, this platform makes it easy to discover, visualize, and extract geospatial data and aerial photographs to support teaching and research. Note that Géoindex is available in two modules: the Géospatial module and the Géophoto module, both described below.
The BCI-MERN agreement was used as leverage to develop Géoindex, but this new platform can host and disseminate other geospatial data from various sources managed under different licences. Therefore, Géoindex includes licensed data from the agreement, such as LIDAR data, which provides researchers with new interpretations of the territory. But it also includes data from research projects such as the L’Atlas de vulnérabilité), which illustrates, among other things, the heat wave sensitivity index and bathymetric data from the Arctic collected by the icebreaking researcher Amundsen. Each layer of information is described according to a metadata profile (UL Profile) that meets the criteria of the North American Profile (NAP) of the ISO 19115 standard. L’Université Laval’s subject headings directory (Répertoire de vedettes-matière [RVM]) is used to standardize the descriptions of the subjects used.
The data are accessible to the entire university network, but some are also open and accessible to the general public, including more than 250 topographic maps dating from 1909 to 2000. Géoindex also allows us to showcase historical documents from library collections, such as topographic maps, and even older documents, such as this map of John Franklin’s first expedition to the Canadian North in 1819, which was digitized by the library at l’Université Laval and georeferenced in order to give it a second life.
The Géophoto module, which is dedicated to the retrieval of aerial photographs that are integrated into Géoindex, supports teaching and research by facilitating the discovery of geographic information. In 2022, the module was enhanced. By switching to this module, users can now consult the entire inventory of aerial photographs held by Québec universities. This represents more than 1,200,000 aerial photographs dating from the 20th century. This primary information, or raw data, is very important for understanding the territory as it was at a specific time. A re-signed agreement between the BCI and MERN will also enable adding more than 1,000,000 aerial photographs digitized by MERN by 2026. As of February 2023, there were already 400,000 digitized copies available in the Géophoto module.
Although the Géoindex platform can host and disseminate geospatial data from research projects, it was not specifically designed for this type of data. For example, deposited data do not receive a DOI and the metadata are not exposed on the open web and, thus, not harvestable by other search engines. However, in future updates, the plan is to make metadata found in Géoindex open and accessible to other search engines.
For the moment, the amount of geospatial data from research projects in Géoindex is not very significant. However, the discovery, visualization, and extraction capabilities will likely increase the amount of geospatial research data over the next few years, without replacing traditional research data repositories like Dataverse. Géoindex should be seen as complementary to traditional repositories with links between them for easy discovery and retrieval.
Ontario
Libraries in Ontario have a long history of collaborating on building discovery and management systems for shared collections, coordinated via the Ontario Council of University Libraries (OCUL). As noted in chapter 4, “Canadian Research Data Management: History and Landscape,” OCUL, established in 1967, is a consortium of all twenty-one university libraries in the province of Ontario. It is involved in collective purchasing, storage, and delivery of library resources and services. The infrastructure behind the shared systems is supported by Scholars Portal, OCUL’s digital infrastructure provider, which consists of librarians, systems administrators, and developers, who are staff at the University of Toronto Libraries. The province-wide consortia-driven infrastructure hosts a variety of shared collections. It has been involved in building, maintaining, and supporting a range of access platforms for data collection, delivery, and end-user support. These include publication collections, such as Scholars Portal Journals and Scholars Portal Books, as well as microdata and geospatial data-focused platforms, including the Scholars GeoPortal, ODESI, and Borealis. A variety of shared licensed collections, open digital collections, and archives collections are hosted and provided to academic researchers at participating member institutions.
The OCUL Geo Community (formerly the OCUL Map Group) was instrumental in the development of the Scholars GeoPortal in 2012. Scholars GeoPortal is a web-based data discovery tool that provides access to licensed and commercial data, national source data collections, regional government and open data, and raster imagery data, including government-derived projects and acquisitions and digital maps. The application is a custom build that uses a combination of Esri technology and other software already in use at Scholars Portal. It leverages the ArcGIS Server as its back-end database and server, and it uses the API tools provided by Esri for visualization and download of data stored in those servers via the custom front-end GIS. This GIS also serves as a shared catalogue and data discovery tool and is supported by a robust metadata editor producing ISO 19115 compliant metadata that are stored in a MarkLogic XML database. Currently, a redevelopment project is underway to further upgrade the GeoPortal to secure the future of the platform and ensure that it continues to meet the needs of the community. Integrations with Borealis (which is discussed in a national and regional context in chapter 4) are being explored as part of the redevelopment work.
OCUL libraries have been facilitating access to geospatial data that is available via the development of shared infrastructure and product licensing. They have also been actively involved in special projects and initiatives both within Ontario and in the larger Canadian context. The historical topographic maps project has led to the scanning of over 1,000 topographic maps at the 1:25,000 and 1:63,360 scales, covering the years 1906–1977. Work is now underway on a larger project to reuse these workflows on the 1:50,000 National Topographic System (NTS) map collection and to ingest these maps into both the GeoPortal and Borealis, to provide for greater integration of the collection with Canada’s national research data infrastructure (e.g., Lunaris). To date, over 6000 maps from the 1:50,000 collection have been made available in this way.
The Ontario Library Research Cloud (OLRC) is a collaboration of Ontario’s university libraries to build a high capacity, geographically distributed cloud storage network using open source technologies. The OLRC is designed to house large volumes of digital content to allow for cost-effective and sustainable long-term preservation and to support data and text mining research tools. This resource is currently being leveraged by several OCUL institutions for preservation of their geospatial data to ensure long-term access. Permafrost builds on the OLRC, supporting workflows for the creation of Archival Information Packages (AIPs) using a consortially managed and supported instance of Archivematica. Archivematica is a suite of open source tools developed by Artefactual to assist in ingest and preservation of digital objects. In some cases, Permafrost is connected to repositories. McMaster University Library’s Islandora instance, which includes over 12,000 maps, plans, and aerial photos from the Lloyd Reeds Map Collection, is one example of the value of this infrastructure. Data are backed up automatically and regularly in the OLRC and stored as AIPs in their digital archive.
As the size of data continues to increase, Scholars Portal identified a need to provide new technical solutions to support transfer of large datasets within academic library data services. This search for digital solutions became even more urgent during the COVID-19 pandemic, as restrictions on contact meant that existing workflows were no longer possible in a remote environment. Scholars Portal developed a solution using Globus, a data transfer tool that supports workflows for large file transfer and direct storage to research environments. OCUL is currently exploring a deeper integration as part of the Scholars GeoPortal redevelopment.
Standardized metadata are equally vital to facilitate access via search and discovery of geospatial data. During the development of the GeoPortal, OCUL did transformative work by recommending and adopting the ISO 19115 standard and Canadian-controlled vocabularies from federal and provincial government agencies. These standards for the creation of dataset and series-level metadata have resulted in enhanced discovery, search capabilities, and access to the collections. The expertise at Scholars Portal in providing instruction on geospatial metadata has also provided a stronger understanding of the importance of geospatial metadata standards across the OCUL community. These standards have been applied to local collections and special projects alike.
Prairies
The Council of Prairie and Pacific University Libraries (COPPUL) is an association of university libraries in Western Canada that includes twelve members from the Prairie provinces of Alberta, Saskatchewan, and Manitoba, which are listed in Table 1. The capacity of staff at libraries to meet the demand for these specialized services varies considerably. Several libraries do not offer any geospatial or GIS services, while those at larger academic institutions (e.g., Calgary, Alberta, Manitoba) offer a more extensive suite of geospatial data services. These libraries serve student and faculty populations of disparate sizes and support different academic programs that have differing RDM requirements. As such, there is a great variation in the type of services these libraries offer. Specifically, these services relate to (1) providing access to geospatial data produced by external agencies; (2) creating geospatial and GIS-related products relevant to the production of new research; and (3) managing geospatial data produced by local researchers as a result of research activity.
University | Province | Geospatial/ GIS Data LibGuide | Geospatial/ GIS Data Catalogue | RDM Dataverse Repository | RDM Geospatial Dataset Availability |
Athabasca | AB | ❌ | ❌ | ❌ | ❌ |
Concordia | AB | ❌ | ❌ | ❌ | ❌ |
MacEwan | AB | ✅ | ✅ | ✅ | ❌ |
Mount Royal | AB | ✅ | ❌ | ✅ | ❌ |
Alberta | AB | ✅ | ❌ | ✅ | ✅ |
Calgary | AB | ✅ | ✅ | ✅ | ✅ |
Lethbridge | AB | ✅ | ❌ | ❌ | ❌ |
Regina | SK | ✅ | ❌ | ✅ | ✅ |
Saskatchewan | SK | ✅ | ❌ | ❌ | ✅ |
Brandon | MB | ❌ | ❌ | ✅ | ❌ |
Manitoba | MB | ✅ | ✅ | ✅ | ✅ |
Winnipeg | MB | ✅ | ❌ | ✅ | ❌ |
COPPUL Prairie libraries have been actively involved in creating geospatial and GIS-related products to help patrons find and use geospatial datasets from within their collections. The types of geospatial materials most frequently included in these products are historical maps, topographical maps, aerial imagery, digital elevation models (DEMs), and climate and environmental records. Examples of specific initiatives include:
- Spatial & Numeric Data Services (SANDS) at the University of Calgary Library, which has been involved in the development of numerous mapping applications that provide access to rare historical maps (e.g., sectional maps of the Canadian Prairies, township plans of Alberta, fire insurance plans of Calgary). Original maps were scanned and georeferenced in order to visualize their geographic locations on an Esri web map for downloading.
- Phase Six of COPPUL’s Shared Print Archive Network (SPAN), which was tasked with identifying historical western Canadian topographic maps (1:25,000 and 1:63,360 NTS series) for preservation and research. Identification of these maps opens possibilities for future digitization and visualization similar to topographic maps available from SANDS and the Ontario Scholars GeoPortal.
- The Southern Alberta Aerial Photographs collection, which displays the geographic locations of vertical aerial photos available for download using a Leaflet web map and CONTENTdm digital library software. The University of Saskatchewan Library Archives and Special Collections created a similar web map identifying the locations of oblique photos from the Howdy McPhail Aerial Photograph collection.
COPPUL Prairie libraries are involved, to different extents, in managing and curating data (including geospatial data) produced by local researchers for their respective universities. Eight of twelve COPPUL member libraries are currently utilizing Dataverse repositories to host and share datasets on behalf of members of their scholarly communities (see Table 1). Seven of these libraries are participants in the externally hosted Borealis service, while the University of Manitoba manages its own implementation of Dataverse. The overall number of datasets deposited and available from Prairie Dataverse repositories (1,099 in total as of March 2022) is relatively modest but growing.
Prairie universities are also publishing their datasets to discipline-specific data repositories (e.g., Dryad for biosciences) or to Canada’s FRDR, which was created in partnership with the University of Saskatchewan and several other Canadian universities. FRDR can be searched through Lunaris, which provides a notable feature that allows users to use a “map search” powered by Geodisy to explore and locate datasets originating from specific Canadian geographic regions using a web map.
In May 2022, the University of Manitoba Libraries released its GISHub geospatial data repository. Initially, the project was conceived to be a secure local storage solution for geospatial data, but it was later re-imagined by incorporating tools available under an Esri site license. It aims to provide a discovery and access point for both proprietary and open researcher data and a secure local environment for active-use geospatial datasets.
For institutions without a Dataverse instance, locally created geospatial research data may be shared in FRDR or other venues. For example, although not a data repository, the University of Saskatchewan’s institutional repository, HARVEST, hosts a small number of geospatial research datasets. It is reasonable to expect that as COPPUL libraries implement their RDM strategies to meet requirements in the Tri-Agency Research Data Management Policy, we will see greater consistency in how, when, and where geospatial research data are shared.
British Columbia
The geospatial research data ecosystem in British Columbia is defined by the services that the province’s academic institutions and public organizations provide. British Columbia’s policies for openly sharing data have enabled users to search and access a wide variety of open data using the BC Data Catalogue and several other specialized platforms for acquiring other province-wide geospatial data, such as LidarBC and BC Land Title and Survey’s ParcelMap BC. At a more granular level, several of British Columbia’s regional districts and municipalities have made data available through localized data discovery platforms, such as the City of Surrey Open Data catalogue and the Metro Vancouver Open Data Portal.
Within British Columbia’s academic sphere, postsecondary institutions use independent geospatial data collection policies based on local administrative, teaching, and research requirements. There are four institutions where libraries are the main owners of geospatial data collections that belong to the Abacus Data Network: Simon Fraser University, the University of British Columbia (UBC), the University of Northern British Columbia, and the University of Victoria. The infrastructure for supporting Abacus is maintained by UBC Library. Universities belonging to Abacus are assigned specific subsets of the network, where users from their own institutions are authenticated to use data only licensed for use by their campuses. This offers a solution for localized collection development and data curation.
Approximately 20% to 30% of the data stored in Abacus is geospatial data. However, the underlying software supporting Abacus — Dataverse — is not designed to provide specialized support for finding and using geospatial data. Recognizing this, UBC Library created middleware software to connect Dataverse to a geo-specific stack of open source software, including GeoServer and GeoBlacklight. This project, called Geodisy (Phase 1), was funded by CANARIE between October 2018 and March 2020. At that time, a second phase of the project began under the funding of the National Digital Research Infrastructure Organization (NDRIO, now the Digital Research Alliance of Canada, or “the Alliance”) and administered by Canada’s Lunaris discovery service. The service is now used to power Lunaris’s Geodisy map search.
Future Directions
Currently, geospatial research data management depends on regional solutions, developed on an as-needed basis, with librarians working to anticipate future needs. Restrictions on time and workload keep the field moving reactively to RDM as a whole. There remain particular demands in the geospatial realm that require creative solutions for how these data are managed for current and future use. Many of the problems have moved or are moving towards shared and consortial solutions, and they will likely continue moving in that direction in the future, perhaps culminating in a national geospatial research data repository. This will require more concerted discussions of geospatial metadata and more work on geospatial access platforms — solutions that will likely be developed through the regional methods.
While current challenges and proposed solutions have been discussed, it is also worth noting some of the current gaps in content caused by data biases. The Indigenous Mapping Workshop, presented by the Firelight Group, has promoted growth in GIS and geospatial data among Indigenous nations, but ongoing work on settler-Indigenous relations in academia continues to grow slowly in this area. Similarly, geospatial data suffers the same systemic biases toward Black and other non-white people in data creation and use as in the data world overall, and work in these areas is slow. Linguistically, Québec has shown leadership in multilingual data access by engaging in bilingual metadata translation. However, other provinces are lagging in non-English metadata creation and dissemination. Finally, while the Canadian landscape has long favoured the south, and while attempts were made to bring in Northern Canadian geospatial RDM expertise, this area remains underexplored.
It may seem trite to describe the field of geospatial research data as simultaneously nascent and developed. However, a concerted effort is being made to expand on the work already done and to bring geospatial RDM in line with the needs of researchers and libraries across the country. Work is ongoing, particularly through the Digital Research Alliance of Canada (known colloquially as “The Alliance”) and the academic consortia outlined above.
Reflective Questions
- How are geospatial data unique, and how does this impact considerations for geospatial Research Data Management?
- Is geospatial Research Data Management better handled by local institutions, by regional consortia, or through national infrastructure investment? What are the benefits and drawbacks of each method?
- Research Data Management requires infrastructure to support it. What infrastructure currently exists? What gaps do you think need to be addressed in order to improve the preservation, access, and use of geospatial research data?
Key Takeaways
- Geospatial data involve a complex interplay of datasets but require primarily thinking about data as they involve space.
- Individual geospatial data management is closely related to research data management, and resources already exist to learn more in this area.
- There are regional projects across the country trying to manage the preservation and access to geospatial research data within the larger geospatial data field.
- Postsecondary institutions are leading these regional projects on an as-available basis.
Additional Readings and Resources
The Digital Research Alliance of Canada has a number of resources on data management and best practices, as well as groups discussing these areas. See Digital Research Alliance of Canada’s Network of Experts and Dataverse North Metadata Best Practices Guide for more.
A white paper was written for NDRIO (now part of The Alliance) regarding Canada’s current and future needs for geospatial data infrastructure. This paper gives some idea of the needs and particularities regarding geospatial data:
Brodeur, J., Handren, K., Berish, F., Chandler, M., Fortin, M., Leahey, A., & Stevens, R. (2020). Enabling broad reuse of Canada’s geospatial data and digitized cartographic materials. A response to the NDRIO Call for White Papers on Canada’s Future DRI. https://alliancecan.ca/sites/default/files/2022-03/final-enabling-broad-reuse-of-canadas-geospatial-data-and-digitized-cartographic-materials.pdf
For introductory GIS learning, see QGIS’s publicly available training materials.
Reference List
Bellin, J. (1764). Port de Louisbourg. Paris: J.N. Bellin.
Esri. 2016. What is raster data? ArcMap: Manage data. https://desktop.arcgis.com/en/arcmap/10.3/manage-data/raster-and-images/what-is-raster-data.htm
Lunaris. (n.d.) Source Repositories. Retrieved 10 August 2023. https://www.lunaris.ca/en/source_repositories
Marshall, P. (1999). Novanet, Inc.–Nova Scotia, Canada. Information Technology and Libraries, 18(3), 130-134. https://www.proquest.com/docview/215830105
OpenStreetMap contributors. (2023) Planet dump [Data file from 2023]. https://planet.openstreetmap.org
QGIS. (n.d.). Vector data: Overview. Documentation: QGIS 2.8. qgis.org [website]. https://docs.qgis.org/2.8/en/docs/gentle_gis_introduction/vector_data.html
Statistics Canada. (2019). 2016 census boundary files: Dissemination areas [Cartographic boundary file]. https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lda_000b16a_e.zip&k=%20%20%20%2090414&loc=http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/files-fichiers/2016/lda_000b16a_e.zip
Stock, K., & Guesgen, H. (2016). Chapter 10 – Geospatial reasoning with open data. In R. Layton & P. A. Watters (Eds.), Automating open source intelligence (pp. 171–204). Syngress. https://doi.org/10.1016/B978-0-12-802916-9.00010-5
ubc-library. (2022). Geodisy. Github. https://github.com/ubc-library/geodisy
a term that describes all the activities that researchers perform to structure, organize, and maintain research data before, during, and after the research process.
the ability of data or tools from non-cooperating resources to work with or communicate with each other with minimal effort using a common language.
data arranged in the form of tables, i.e., in rows and columns.
an underlying or reference map that sits underneath the data, to give context to it. For example, if you make a map showing demographic information in particular census areas, then your map is harder to read without something to indicate where those abstract census area shapes are. Though you can also argue a map is an abstract representation as well, it is something people learn to read, and so can give positional information to situate the individual – so the base map allows that positional information to situate the data that is used overtop.
data that represents spaces as a regular grid or series of cells, each with a particular value – often thought of as the pixels of an image. For example, a scanned historical map or an air photo.
data that comprises individual points that refer to specific locations. These points can be joined to form lines or enclosed shapes (polygons). The points, lines, and polygons can each be treated as individual units with associated data.
the visual representation of a geographic dataset in any digital map environment. Conceptually, a layer is a slice or stratum of the geographic reality in a particular area, and is more or less equivalent to a legend item on a paper map. On a road map, for example, roads, national parks, political boundaries, and rivers might be considered different layers (ESRI, n.d.).
data about data; data that define and describe the characteristics of other data.
a name (not a location) for an entity on digital networks. A DOI provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. A DOI is a type of Persistent Identifier (PID) issued by the International DOI Foundation. This permanent identifier is associated with a digital object that permits it to be referenced reliably even if its location and metadata undergo change over time (CODATA Research Data Management Terminology, CC BY 4.0).
online, free of cost, accessible data that can be used, reused, and distributed provided that the data source is attributed.
the process of connecting different, often disparate systems or tools into a cohesive infrastructure.
when software is open source, users are permitted to inspect, use, modify, improve, and redistribute the underlying code. Many programmers use the MIT License when publishing their code, which includes the requirement that all subsequent iterations of the software include the MIT license as well.
an Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved within an OAIS (OAIS term). (Digital Preservation Handbook, n.d.).
an OCLC tool for managing and presenting digital content. See https://www.oclc.org/en/contentdm.html for more information.
aerial photograph taken with the axis of the camera held at an angle between the horizontal plane of the ground and the vertical plane perpendicular to the ground. A low oblique image shows only the surface of the earth; a high oblique image includes the horizon (ESRI, n.d.).
a policy applying to data collected with research funding from one of Canada's three federal funding agencies. The policy is intended to encourage better research by requiring researchers to create data management plans and preserve their data.