Data quality can
be defined as fitness for purpose, or how suitable some data is in satisfying
particular needs or fulfilling certain requirements to solve a problem (Coote
& Rackham, 2008). Quality is a major concern as it determines the limits of
use for any dataset, and it is key in putting GIS products into an
understandable form. (Paradis & Beard, 1994). As identified by Van Oort
(2006), spatial data quality has been an increasing concern due to two reasons,
(1)
The
emergence of Geographical Information Systems (GIS) in the 1960s and
(2)
From
the 1970s onwards, a strong increase of available spatial data from satellites.
He also states
that the number of users from no spatial disciplines have grown due to the
large-scale adoption of GIS. This is certainly the case for Volunteered
Geographical Information (VGI) and neogeography applications. The quality of
geographic data can be assessed against both subjective and quantitative
quality elements. Based on the ISO standards for the quality principles of
Geographic information1, Cooke and Rackham (2008) outline how both these
quality elements can be assessed:
Subjective
elements
provide a valuable initial indication as to how useful a particular data is
going to be for certain purposes. They usually fall under three headings:
Ø
Purpose – the rational
for creating the dataset
Ø
Usage – the
application to which the dataset has been put
Ø Lineage – the history of
the dataset
Quantitative
elements
imply a quality evaluation involving measurement and an objective result.
They are categorized as follows:
Ø Positional accuracy: the accuracy of
the position of features or geographic objects in either two or three
dimensions. Positional accuracy can be expressed either as the absolute accuracy;
the closeness of coordinate values to values accepted as true, relative
accuracy; closeness of the relative positions of objects in a dataset to
those relative positions accepted as true, or gridded data position
accuracy; the closeness of gridded data position values to those accepted
as being true.
Ø Temporal accuracy: This is the
accuracy of temporal attributes, such as dates and time, and the temporal
relationships of features, such as ‘later’ or ‘earlier than’ relationships.
Temporal accuracy can be expressed as the accuracy of time measurement;
i.e. if the stated recorded dates of objects are correct, temporal
consistency; the correctness of ordered events, or temporal validity; the
validity of data with respect to time.
Thematic accuracy: This is the
accuracy of quantitative attributes; such as population, no quantitative
attributes; such as geographic names, and classifications; how
correct classes assigned to attributes are in relation to ground truth.
Completeness: This is the
presence and absence of objects in a dataset at a particular point in time.
These can be errors of omission; data missing from the dataset which
should have been included at the time of capture (such as missing streets or
street names) or commission; Data that is present in the dataset but
should have been omitted (such as buildings now demolished).
Logical consistency: This is the
level of adherence to logical rules of data structure, attribution and
relationships. This can be characterized as conceptual consistency, domain
consistency, format consistency and topological consistency.
Spatial data quality is usually
implicitly implied in mapping and traditionally the implicit measures of quality,
transferred from surveyor to the cartographer, were understood by experts.
However, the nature of digital data requires an explicit approach in
communicating the overall quality of map data, hence the expertise and
knowledge of the surveyor, cartographer or geographer needs to be passed on to
the GIS user (Cooke & Rackham, 2008). Another factor to be considered
once an assessment of data quality has been carried out is assessing fitness
for use. As mentioned at the beginning of this chapter, quality can be defined
as fitness for purpose. Van Oort (2006) outlines three steps in how this can be
achieved:
1.
To search for a spatial dataset that contains the information needed for the
intended application
2. To explore whether there are legal or financial constraints to access or particular use of the spatial data
3. Finding out if, given the spatial data quality, risks are acceptable.
2. To explore whether there are legal or financial constraints to access or particular use of the spatial data
3. Finding out if, given the spatial data quality, risks are acceptable.