Introduction to GIS - Spatial Data Input I

    In the best of all possible worlds, spatially referenced data in the desired format already exists for your specific GIS application. It is true that spatial datasets are becoming increasingly available, although the quality of such data are not always guaranteed. Data input is a major expense and bottleneck in implementing a GIS. The initial cost of building a database is often 5 - 10 times the cost of the hardware and software. If data do not presently exist in the desired format (here assumed to be formats compatible with ARC/INFO or ArcView) data can be brought into ARC/INFO or ArcView in a variety of ways:

    Keyboard entry - especially for attribute data.
    Digitizing an existing data source (e.g., map) to create vector data.
    Scanning an existing data source (e.g., aerial photo) to create raster data.
    Conversion of existing digital data.
    Purchase of ARC/INFO data coverages.

    Often more than one option is available and the user must make decisions regarding which method is most efficient, without sacrificing data quality. For example, if the scanned image is "messy" in that it contains more information than desired (e.g., maps with many text labels), then digitizing the map may be a better option when compared to image conversion (scanning and raster-to-vector conversion). Although the process of image conversion is less tedious than manual digitizing, the follow-up editing to delete unnecessary and distracting symbols may make this the more laborious and costly process in the long run.

    The following paragraphs elaborate on input of spatial data using various methods. Some of you may need to enter spatial data for your final independent project. We encourage you, however, to use existing datasets as much as possible for your final project due to the time-consuming task of entering and editing spatial data. Remember that entering the data is simply the first step. Often as much, or more, time is needed to "clean" the spatial data so that it is usable (i.e., accurate) or can be combined with other spatial datasets.

DIGITIZING FROM A TABLE

        Digitizing of maps and other images requires a special digitizing table (or tablet) that has a fine wire grid embedded in its surface. The underlying wire grid records the x,y locations of entered points; these data are then communicated to the computer. Digitizing can be done using the ARCEDIT subsystem in ARC/INFO or a WinTab digitizing driver software and the Digitizer Extension in ArcView on IBM-compatible computers only (i.e., digitizing with ArcView is not possible on UNIX or Macintosh). Unless specified at the onset of digitizing, these spatially referenced data (i.e., points, lines, polygons) are in digitizing units (e.g., centimeters or inches) and may need to be converted to real world coordinates (e.g., kilometers or miles) at a later time. Control points with known locations (called tics in ARC/INFO) are used to relate digitized coordinates to real-world coordinates.

        Digitizers cost $400 - $4000. Besides hardware-associated costs, the high cost of GIS operations often is for data entry, primarily through digitizing operations. Often several people are hired full-time just to digitize data! Digitizing is time consuming (and often boring). One rule of thumb is: one digitized boundary per minute. (It would take about 2 hours to digitize all 115 county boundaries in Missouri).

    There are three major steps to digitize spatial data:

        1. Prepare map or other image for digitizing.
        2. Enter (digitize) features from the map.
        3. Edit the digitized map for accuracy and to ensure quality.

        The digitizing process converts the spatial features (points, lines, areas) of a map (a physical model of reality) into digital format (i.e., into a series of x,y coordinates). This is done by manually tracing map features using a cursor or mouse. There are certain guidelines which help to ensure accurate and efficient data entry. First, use good quality base maps that are easy to read and on stable material (e.g., mylar) that resists shrinking or stretching; the latter can cause many distortions, especially when digitizing occurs over several time periods (even years). Second, establish ahead of time how digitizing will proceed (e.g., all arcs digitized before points) and track your progress on the map. Third, prepare your maps by (a) locating and labeling tic points - these should be points for which you can obtain real world coordinates; (b) marking node points on the map manuscript where intersections are not entirely clear (e.g., oblique intersections); (c) adding nodes onto long arcs to increase accuracy; (d) indicate start and stop points for areas (e.g., polygons which are digitized as one arc); (e) adding unique label identifiers for each polygon.

Preparing a map manuscript for digitizing (from Understanding GIS: the ARC/INFO method, ESRI 1994):
 

        Data entry usually begins with registering the tics (a minimum of 4 tics with known real world coordinates is recommended). These control points are essential for several reasons: (1) they allow several coverages to be digitized from the same map and then later related spatially; (2) they ensure that digitizing does not have to be completed in a single session; (3) they allow the new dataset to be related to known geographic locations and/or other maps; (4) they provide a way to evaluate human error in digitizing. Every time that you begin digitizing the same map manuscript again, you will reenter the tic locations in order to determine how well you match the previous set of tics. When digitizing in ARC/INFO, the ARCEDIT program provides you with the RMS (root mean square error) in digitizing units and map units between this most recent entry and the initial tic entry to let you evaluate your accuracy. Although the error you are willing to accept will ultimately depend on the functions and scale of the map, normally a limit of 0.003 or 0.004 inches (0.008 to 0.01 cm) RMS is set. If you exceed this error, you should reenter the tic values to lessen the RMS error (this is where a stable map is particularly important).

        While digitizing in ArcView, you specify initially in the Digitizer Setup window how much error you will allow by entering a value in the Error Limit field (e.g., 0.004 inches is the default value). Following entry of your control points, ArcView calculates the RMS error. If your calculated error exceeds the limit you entered in the previous step you will need to (1) re-digitize the control points, (2) re-enter the corresponding ground points (i.e., known locations), or (3) increase the error limit field before proceeding. Common causes of high RMS errors are poor digitizing, careless placement of tic locations on the map, and digitizing from a wrinkled map.

        Following data entry in ARC/INFO, the ARCEDIT program has several commands to assist you in identifying and correcting digitizing errors (e.g., NODEERRORS <cover> will list all node errors such as dangling nodes). In addition, the coverage(s) can then be transformed to real world coordinates, and projected into desired system (e.g., Universe Transversal Mercator, State Plane Coordinates, etc.) using commands in the ARC program (PROJECT, TRANSFORM). In ArcView, digitizing is assisted using a Popup menu, which allows one to automatically snap lines to vertices, boundaries, intersections, delete last point, etc. In addition, digitizing can be automatically updated to real world coordinates if the map projection is set in the view and real world coordinates are identified and entered during the first stages of digitizing.

        Don't be fooled by the apparent ease of digitizing - it requires a great deal of patience and labor to ensure accuracy. Moreover, there are lots of inherent problems that may or may not be beyond your control.

Map Problems:

        Maps were not created for digitizing
        Maps may have errors or simplifications to better display information.
        Maps are not stable - control points must be entered each time map is digitized.
        Features at map boundaries may not be aligned on adjacent maps.

User Errors:

        It is difficult to accurately trace features.
        Digitizing is tiring and boring.
        Some user errors can be corrected automatically, but others require manual editing.

DIGITIZING FROM A SCANNED IMAGE ON THE SCREEN

        Scanners produce raster data. Video scanners, which are similar to television cameras, are inexpensive ($500 - $10,000) and fast, but not high quality. Electromechanical scanners are expensive ($10,000 - $100,000) and slow, but create higher quality data.

        In ARC/INFO, one can use a scanned image as a backdrop, and then only those map features that are needed can be entered using the mouse in a fashion similar to table digitizing.  Alternatively, interactive image processing programs (e.g., ArcScan) can be used to convert from raster into vector format. In a later class you will create shapefiles in ArcView using scanned images as a backdrop.

Problems with Scanning:

Documents must be clean and clear (small features may be missed).
Automatic recognition of features is not very good.
- Special symbols (e.g., marshlands) may not be recognized.
- Text may be scanned as line features.
- Complex collections of lines may be poorly captured.
CONVERSION BETWEEN SOFTWARE SYSTEMS

        Digital data in one form can be converted into a different form needed by the GIS. Data is becoming increasingly available on CD-ROMs, which are relatively inexpensive. ARC/INFO contains programs to convert many common forms of data to coverages, e.g., U.S. Census Bureau TIGER files, AutoCAD DXF files (precision line drawings), USGS DEMs (digital elevation models), among others. ArcView3.1 also can support AutoCAD files, MicroStation design files (.DGN files) through its CAD Reader extension, ARC/INFO interchange files (*.e00), GPS data, USGS Digital Elevation Models (DEM) and US DMA Digital Terrain Elevation Data (DTED) through the Spatial or 3D Analyst extension.

        Automated surveying techniques allow surveying (COGO - coordinate geometry) data to be captured in the field in digital form and downloaded for input to GIS.

        GPS (Global Positioning System) receivers use signals from NAVSTAR satellites to accurately determine locations. GPS relies on a constellation of satellites orbiting the earth. Four of the satellites must be visible to determine latitude, longitude and elevation. Satellite signals are weakened when passing through foliage and they do not pass through wood over a few cm thick, buildings or people. Prices range from about $1,000 to $60,000. Both hand held and backpack receivers are available. Accuracies of one meter or less can be achieved at a fixed point in the open; accuracies of 1-2 meters are common under trees. Somewhat less accurate measurements are achieved while moving (e.g., on a bicycle, truck, or animal), depending on the GPS receiver. Many GPS receivers provide software to convert the data into ARC/INFO format. Such data can also be readily incorporated into ArcView as event themes.

KEYBOARD ENTRY

        A considerable amount of locational data exists within historical records (e.g., survey records, museum specimens, plant vouchers). If you can gain access to this information, then you can enter these data into ARC/INFO in ASCII format or ArcView in DBASE or ASCII format. Alternatively, you may have established a spatial grid from your own research and wish to enter this as a text file.

        In ARC/INFO, the GENERATE command in ARC can be used to create spatial data. It can be used interactively, where the user enters each x,y coordinate at the terminal on a separate line, or using an existing file of x,y coordinates. A number of different coverage types can be created using GENERATE. For example, one can create tic, point, line, polygon, or route coverages. In all cases, you need to provide an ID value for each tic, point, line, etc., and corresponding x,y coordinates. Fields contained within ASCII files can be separated by tabs (tab-delimited) or commas (comma-delimited). Examples of ASCII files for creation of point, line, and polygon coverages in ARC/INFO are:

Example 1: POINT COVERAGE (4 points)
1,77.0033, 5.1230
2,75.0345, 4.1266
3,76.8700, 5.0500
4,77.4550, 4.8805
END

Example 2: LINE COVERAGE (2 lines)
1
40000    53000
43000    49000
46000    50000
49000    47000
52000    46000
END

2
40000    53000
42000    47000
44000    44000
47000    42000
50000    40000
52000    36000
END
END

Example 3: POLYGON COVERAGE (1 polygon)
1
734661.34230,    4279254.6231
734948.23956,    4280872.32880
738065.61283,    4280594.42309
737956.86734,    4279615.71301
737993.11520,    4279349.89058
738041.44675,    4279156.56499
738138.11013,    4278878.65975
734661.34230,    4279254.6231
END
END

(Note: in the last example, the first and last set of x,y coordinates are identical. Why is that?)

In ArcView, one can generate point or route themes using data contained within ASCII or DBASE (*.dbf) files through Add Event Themes option in the View Menu.

Adding an Event Theme in ArcView using voucher specimen data from the Missouri Botanical Garden.

        In this demonstration you will create a point theme from the file colpltsxy.txt, which contains real world locations of a subset of plants collected in Colombia and deposited at the Missouri Botanical Garden. Coordinates are in decimal degrees so that they are compatible with the already existing Digital Chart of the World or ArcWorld coverages. The file colgis.dbf contains related attribute information, such as plant-id, senior collector's name, voucher number, elevation (often in ranges), and plant species name. Following addition of a point theme in ArcView, you can then join attribute information using Add Table and Join commands in ArcView.

        It is important to remember that these are only a subset of the data available from the Missouri Botanical Garden's TROPICOS database. Criteria for selection were that the specimen was from Colombia, was identified to species, and had information on elevation collected, latitude and longitude. The database was then searched until approximately 235 appropriate records were found.

To create the plant point coverage:

>>>Open a new project and add the following theme and tables to an empty view:

tables: colpltsxy.txt and colgis.dbf
theme: colombia (polygon theme)
       These files and theme are located in the /arcstuff/arcdata/gisclass/createdata directory.

>>>Save the project in your /arcstuff/arcdata/gisclass/s00/xxxxxx directory as createdata.apr.

>>>Open the table colpltsxy.txt in Text Editor (select Programs and Text Editor from the Workspace Menu).

Note the file's structure in which each line of data has three numbers separated by commas: the first number is a unique plant-id value, and the next two numbers are longitude and latitude in decimal degrees, respectively. Western and southern hemisphere values are negative in decimal degrees. The structure of this table differs from the one needed to create a point coverage in the ARC/INFO GIS software program (see above). ArcView tables must have a first line which identifies the variable names. Also the final line is a row of data and not the word "end". Note: decimal degrees = degrees + minutes/60 + seconds/3600. In this example, the attribute information is separated into another table. These could be together in the same table, and the table could be either an ASCII or DBASE file.
>>>Close the Text Editor and return to the ArcView project.

>>>Edit the legend for Colombia so that a single clear polygon is visible.

>>>To create a new point theme, select "Add Event Theme" from the View Menu.

An Add Event Theme window opens with the Y.X button highlighted.
>>>Select colpltsxy.txt as the Table, Longitude as the X field and Latitude as the Y field. Then click on OK.
A new theme, colpltsxy.txt, appears in the view window.
>>>Change the marker symbol size to 6 in the Legend Editor. Display colpltsxy.txt and zoom in on the northern coast of Colombia.
Are there errors in the plant dataset? If so, what are those errors and how did they happen?
>>>Open the Attributes of Colpltsxy.txt table.

>>>Join the table containing attribute information, colgis.dbf, to the Attributes of Colpltsxy.txt table using the Plant_id fields.

>>>Use the Legend Editor to display Colpltsxy.dbf using Unique Value and the Collector field.

>>>Save the ArcView project and exit ArcView.
 

GETTING DIGITAL DATA FROM THE WEB.

        An increasing amount of digital data is becoming available through the Web. The quality of this data is variable and one should be very careful in its use. Downloading data is a little tricky as coverages in ARC/INFO format are made up of multiple files that are stored in two or more folders. However, coverages, grids, and other spatial data can be condensed into a single interchange file using the Export command in ARC/INFO. If needed this file can further be compressed using gzip or other compression software to reduce it in size.

        In the following exercise, we will download coverages of the ecological sections and state parks of Missouri and import these interchange files into ArcView. The files can be obtained either by downloading directly off the Web page or by using the Get command at an ftp prompt.

To Download Data from the Web Page:

1.  ARC/INFO interchange files (*.e00 files)

>>>Open Netscape.

Netscape is located under Local Apps in the Workspace Menu.
>>>Navigate to the Missouri Spatial Data Information Service home page by typing in the following address in the Netsite box: http://msdis.missouri.edu.

>>>Select Data from the available selections at the top of the page.
>>>Select the /pub/ directory from the FTP Site list (on the left).
            This will move you to the /pub directory.

The files we want are located in the /pub/state/natural directory.

>>>Click on the state/ folder to move to this directory. Then click on the natural/ folder to reveal a set of ARC/INFO interchange files.

Interchange files have the cover_name followed by the .e00 extension. The additional .gz extension means that this file has been compressed using the gzip program.
>>>Open the dnrparks.e00.gz file by clicking on the file name.
The contents of the interchange file are now visible in the uncompressed (*.e00) form and can be downloaded directly.
>>>Select "Save As..." from the File Menu in Netscape.
In the box titled Filter (at the top of the Save As window), you can direct the file to a desired directory by typing the full pathname, leaving the generic extension as *.e00.
>>>To save the file to your student directory enter /arcstuff/arcdata/gisclass/s00/xxxxxx/*.e00 in the box titled Filter.

>>>Click on the Filter button at the bottom of the Save As window.

The Selection box near the bottom of the Save As window should now have this pathname followed by dnrparks.e00. If not, retype in the Selection box the path /arcstuff/arcdata/gisclass/s00/xxxxxx/dnrparks.e00.
>>>Click OK to transfer the file to your account.

>>>Click on Back in the upper left and repeat the process for the file secosects.e00.gz.

>>>Exit Netscape.

To import the interchange file in ArcView:

        If you are using ArcView on a PC, there is a special IMPORT Utility that allows you to select the interchange file and have it translated into a theme. This theme can then be brought into an ArcView project. For ArcView operating on a UNIX platform, this must be done by typing a command.

>>>Open a command tool window (Select Command Tool under Programs in the Workspace Menu).

>>>Move to the "bin" directory of ArcView: cd /arcstuff/arcview3.new/bin

>>>Check your location by typing pwd and then type ls to see the contents of the bin folder.

The import command allows an ARC/INFO export file (*.e00) to be imported into ArcView. It has the form: import <path-name of the *.e00 file> <path-name of the new theme>

>>>Import the coverage to your own directory by entering the following command on one line:

 import   /arcstuff/arcdata/gisclass/s00/xxxxxx/dnrparks.e00    /arcstuff/arcdata/gisclass/s00/xxxxxx/dnrparks
Complete this command on one line with a space separating "import" from the full path name of the *.e00 file and with a space between the full path name giving the new "theme" name.

>>>Repeat the import steps for the secosects.e00 interchange file.  Name this new coverage secosects and store it in your /arcstuff/arcdata/gisclass/s00/xxxxxx directory.

You can now view the downloaded digital data in ArcView:

>>>Open the project createdata.apr in ArcView.

>>>Add the dnrparks and secosects themes to a new View window.

>>>Display the Secosects theme by selecting Unique Value in Legend Type and Secname for the Values Field.

>>>Display the Dnrparks theme by selectingUnique Value in Legend Type and Name for the Values Field.

>>>Save createdata.apr.

2. USGS Digital Elevation Models (DEM):

    USGS DEM file is a raster data set representing the surface of the earth.  In ArcView 3.x versions, standard USGS 7.5-minute, 1-degree, or any other file in USGS DEM format can be imported into ArcView projects.  Digital terrain data covering most regions in the United States are available in the USGS DEM format from the U. S. Geological Survey (see the National Geospatial Data Clearinghouse site).  Digital terrain data for Missouri are available at the Missouri Spatial Data web page as ARC/INFO interchange files (i.e., *.e00).

    USGS DEM files can only be imported using the Spatial or 3D Analyst extensions in ArcView.

To import DEM file into an ArcView project after it has been downloaded from the web:

>>>Start ArcView or return to the already opened createdata.apr project.

>>>Load the Spatial Analyst extension by selecting Extensions from the File menu while the Project window is active.

>>>Open a new view.

>>>Import the DEM file by selecting "Import Grids" from the File menu while the View window is active.

An "Import file type" window pops open with options to import ASCII Raster, Binary Raster, USGS DEM, or US DMA DTED file.
>>>Select "USGS DEM" as the file type and click OK.
>>>Navigate to the /arcstuff/arcdata/gisclass/createdata/ directory and highlight the lacey.dem file; this DEM is from an area within the state of Pennsylvania.  Click OK to import the DEM.
An "Import USGS DEM files" window pops open asking you to provide a grid name to the imported DEM file.  The default name is imgrd1 (image grid1).  ArcView translates the DEM file into a grid in ARC/INFO format.
>>>Name the imported grid lacey and store it in your /arcstuff/arcdata/gisclass/s00/xxxxxx directory.  The DEM is then converted to a grid, but is not directly added into your View window.

>>>Add lacey grid theme into the View and display the elevational data.  Remember to select Grid Feature Source as the File Type in the Add Theme window.

>>>Save createdata.apr and exit ArcView.

The Missouri Spatial Data web site has a wealth of spatial data.  You might want to spend some time reading about the data available at the site.  Two other coverages exist as ARC/INFO interchange files in the createdata directory.  One contains digital terrain data for Lincoln County, Missouri (lincoln.e00), while the other contains point localities for Timber Rattlesnakes in Missouri (tim_rattle-pt.e00).  Additional DEM coverages or amphibian/reptile locality data can be found on the Missouri Spatial Data web site.

Suggested exercises:

1. Create your own text file from locational data to create point themes.

2. Download additional Missouri coverages from MSDIS web site and import these into an ArcView project - try downloading a DEM raster file or a coverage for one of the Missouri reptiles or amphibians (under pub/state/natural/vertebrate/sherp) for one of the Missouri counties.  These files will import in the same way as dnrparks.e00 and secosects.e00.

Back to GIS syllabus