University of California, Riverside UCR


ITR/IM: Handling Uncertainty in Spatial Databases

IIS-0114036

Principal Investigator

Bir Bhanu, Center for Research in Intelligent Systems Bourns B232, University of California at Riverside, Riverside, CA 92521, Tel. 951-827-3954, Fax. 951-827-3188
bhanu@cris.ucr.edu
http://www.vislab.ucr.edu/PEOPLE/BIR_BHANU/index.htm

Co-PI

Chinya Ravishankar, Center for Research in Intelligent Systems , Dept. of Computer Science and Engineering, University of California at Riverside, Tel. 951-827-2451, Fax. 951-827-4643
ravi@cs.ucr.edu
http://www.cs.ucr.edu/~ravi

Keywords: Uncertainty Modeling, Spatial Databases, Feature Extraction in Images, Digital Maps, Approximate Queries, Indexing Structures

Research and Education Activities

The goal of this research project is to develop new techniques to handle uncertainty in databases. This research effort develops new theory, tools and technology to support various types of probabilistic uncertainty in databases. The experimental research is linked to spatial databases with applications in the field of Geographic Information Systems. By adding support for uncertainty in database management systems, this research project will increase the power and flexibility of database management in a broad class of applications.

There are many potential application domains, but two specific areas of interest are automated population and management of dynamic geospatial databases in both mapping and surveillance scenarios. Today, various optical, infrared and radar sensors are repeatedly observing large regions, generating vast quantities of multi-media imagery. Image analysis algorithms are emerging that can categorize terrain features of interest (such as crops, woodlands, etc.) or objects (buildings, vehicles, and people) with varying degrees of confidence. Many of these features of interest are inherently dynamic (such as crops, rivers, sea ice, etc.) with long-term, seasonal and short-term variations. Results from computer algorithms involve some uncertainty due to variations and noise in the input, as well as inherent errors that produce false alarms. In addition, the performance of these algorithms may not degrade gracefully, leading to 'data' existence problems. Uncertainty in map databases can arise from several sources. First, changes in the real world can cause information to become out of date, even if temporarily. Without efficient mechanisms for managing uncertainty-related metadata, it is extremely hard either to improve the quality of query responses or to manage the data quality. Second, much of the data is acquired using automated image processing techniques on satellite images. Image processing techniques produce features that have significant amounts of uncertainties.

Our activities have focused on spatial join under uncertainty, modeling uncertainty for spatial objects and the development of a hierarchical approach for retrieval of uncertain objects. We have built a small prototype database of uncertain spatial objects that are extracted from color video. It consists of regions and their properties. We also had technical meetings ESRI and have acquired their commercial packages. These programs are running in our lab. Our database research software developed under this program will be running on the top of this software. We have used Census TIGER/Line data and Mojave Desert Ecosystem Spatial Data as the test case.

Findings

(a) The modeling of spatial objects with uncertain boundaries and evaluating spatial queries over them is an open issue. We have developed a method for performing spatial joins over uncertain spatial data. We have also proposed a probabilistic spatial data model to model the positional uncertainty of polygons. Based on this model, we developed the PrR-tree, a probabilistic index structure to support probabilistic filtering, and an algorithm for the refinement step. Our experiments demonstrate that our approach achieves higher accuracy for probabilistic spatial join queries, while reducing the overall cost by a factor of more than two.

(b) Managing and manipulating uncertainty in spatial databases are important problems for various practical applications. We propose a probability-based method to model and index uncertain spatial data where every object is represented by a probability density function (PDF). To index PDFs, we construct an optimized Gaussian mixture hierarchy (OGMH) and two variants of uncertain R-tree. We provide a comprehensive comparison among these three indices and plain R-tree on TIGER/Line Southern California landmark point dataset. We find that uncertain R-tree is the best for fixed query and OGMH is suitable for both certain and uncertain queries. Moreover, OGMH is suitable not only for spatial databases, but also for multi-dimensional indexing applications like content based image retrieval, where R-tree is inefficient in high dimensions.

(c) Consideration of uncertainty in manipulation and management of spatial data is important. Unlike traditional fuzzy approaches we use a probability-based method to model and index uncertain data in the application of Mojave Desert endangered species protection. The query is a feature vector describing the habitat for certain species, and we are interested in finding geographic locations suitable for that species. We select appropriate layers of the geo-spatial data affecting species life, called habitat features, and model the uncertainty for each feature as a probability density function (PDF). We partition the geographic area into grids, assign an uncertain feature vector to each cell, and develop a filter-and-refine indexing method. The filter part is a bottom-up binary tree based on the automated clustering result obtained using the EM algorithm. The refine part processes the filtered results based on the 'similarity' between the query and properties of each cell. We compare the performance of our proposed indexing structure with the intersection method from Mojave Desert Ecosystem Program (MDEP); our method is more selective and efficient.

(d) As a commonly used unsupervised learning algorithm in spatial data analysis, Expectation-Maximization (EM) algorithm has several limitations, especially in high dimensional feature spaces where the data are limited and the computational cost varies exponentially with the number of feature dimensions. Moreover, the convergence is guaranteed only at a local maximum. We propose a unified framework of a novel learning approach, namely Coevolutionary Feature Synthesized Expectation-Maximization (CFS-EM), to achieve satisfactory learning in spite of these difficulties. The CFS-EM is a hybrid of coevolutionary genetic programming (CGP) and EM algorithm. The advantages of CFS-EM are: 1) it synthesizes low-dimensional features based on CGP algorithm, which yields near optimal nonlinear transformation and classification precision comparable to kernel methods such as the support vector machine (SVM); 2) the explicitness of feature transformation is especially suitable for image retrieval because the images can be searched in the synthesized low-dimensional space, while kernel-based methods have to make classification computation in the original high-dimensional space; 3) the unlabeled data can be boosted with the help of the class distribution learning using CGP feature synthesis approach. Experimental results show that CFS-EM outperforms pure EM and CGP alone, and is comparable to SVM in the sense of classification. It is computationally more efficient than SVM in query phase. Moreover, it has a high likelihood that it will jump out of a local maximum to provide near optimal results and a better estimation of parameters. (Paper 3 in the attachment)

(e) We developed an integrated approach to learning for image retrieval in dynamic databases that incorporates a statistical mixture model of the data, relevance feedback and long term continuous learning to improve retrieval performance. The concept learning is incrementally refined with increased retrieval experiences with multiple users and different concepts. The concept knowledge can be transplanted to deal with dynamic database conditions such as insertion of new images, removal of existing images and query images that are outside of the current database. The approach can also deal with cases where users may mislabel some images during relevance feedback. We model the image distribution in feature space as a mixture of Gaussian densities. The task of concept learning is then accomplished by estimating the mixture model parameters. From a theoretical point of view, the mixture-based analysis defines a strict mathematical model, provides a test for the validity of cluster structure and a powerful analysis tool for concept learning in image databases.

We have acquired commercial packages developed by ESRI. We have used Census TIGER/Line data, Mojave Desert ecosystem database, MSTAR SAR data and our visual image database. We have focused on the Mojave Desert Ecosystem to demonstrate our ideas. This will provide several real-world examples where modeling uncertainty makes a difference. We have developed a simple interface for uncertain queries within ArcGIS. We plan to integrate indexing software with this interface. We are extending our work to spatio-temporal domain and developing uncertainty models for accomplishing multi-level queries and propagation of uncertainties.

Training and Development

Graduate students Jinfeng Ni, Anlei Dong and Rui Li are partially supported on the project. Jingfeng Ni and Rui Li have passed the qualifying examinations and Rui Li has advanced to candidacy. Anlei Dong completed his Ph.D. in Dec, 2004. Parts of the research accomplished to date will be integrated in database and computer vision and database courses. Undergraduate student Michael Kurth has experimented with ESRI GIS products, has brought t he Mojave Desert Ecosystem spatial database on line and interfaced the uncertainty related software with ESRI products. He is a coauthor on a couple of conference papers and one journal paper. Jenny Deng, another undergraduate student, learned Bayesian nets an associated computing tools during the brief period of her involvement with the project. For this project we have established cooperation with Dr. Erik Hoel of ESRI. We had a couple of meetings at UCR and ESRI. This cooperation will allow our research to impact the real applications in the GIS area.

Publications and Products

  • B. Bhanu and Y. Lin, "Synthesizing feature agents using evolutionary computation", Pattern Recognition Letters. Special Issue on Remote Sensing, Also a version in International Workshop on Pattern Recognition in Remote Sensing, vol. 25, (2004), p. 1519-1531. Published
  • J. Ni, C.V. Ravishankar and B. Bhanu, "Probabilistic Spatial Database Operations", Advances in Spatial and Temporal Databases, Lecture Notes in Computer Science, vol. 2750, (2003), p. 140-158. Published
  • A. Dong and B. Bhanu, "A New Semi-Supervised EM Algorithm for Image Retrieval", IEEE CS Conference on Computer Vision and Pattern Recognition, vol. II, (2003), p. 662-667. Published
  • A. Dong and B. Bhanu, "Concept Learning and Transplantation for Dynamic Image Databases", IEEE International Conference on Multimedia and Expo, vol. , (2003), p. 765-768. Published
  • A. Dong and B. Bhanu, "Active Concept Learning for Image Retrieval in Dynamic Databases", International Conference on Computer Vision, vol. , (2003), p. 90-95. Published
  • B. Bhanu, R. Li, C. Ravishankar, M. Kurth and J. Ni, "Indexing Structure for Handling Uncertain Spatial Database", 6th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, vol. , (2004), p. CD. Published.
  • B. Bhanu, R. Li, C. Ravishankar and J. Ni, "Handling Uncertain Spatial Data: Comparisons between Indexing Structures", Third International Workshop on Pattern Recognition in Remote Sensing, vol. , (2004), p. CD. Published
  • R. Li, B. Bhanu, C. Ravishankar, M. Kurth and J. Ni, "Uncertain Spatial Data Handling: Modeling, Indexing and Query", IEEE Transactions on Geoscience and Remote Sensing, vol. , (2005), p. . Submitted
  • R.Li, B. Bhanu and A. Dong, "Coevolutionary feature synthesized EM algorithm for image retrieval", ACM Multimedia Conference, vol. , (2005), p. 1-10. Published
  • A. Dong and B. Bhanu, "Active Concept Learning in Image Databases", IEEE Trans. on Systems, Man and Cybernetics, Part B, vol. 35, (2005), p. 450-466. Published


 


Project Impact

Our findings have enhanced our abilities to (a) model uncertainties in spatial and image databases, (b) perform spatial join under uncertainty, (c) develop new indexing structures to handle uncertain spatial data, (d) build a model of image databases and learn concepts through user interaction and exploitation of meta knowledge. (e) develop a coevolutionary synthesized EM algorithm for image retrieval that is suited to handle large number of features. Our research advances the fields of databases, computer vision and machine learning.

Our indexing structure OGMH is suitable not only for spatial databases, but also for multi-dimensional indexing applications like content based image retrieval, where R-tree is inefficient in high dimensions. Our coevolutionary feature synthesized EM algorithm allows to use EM algorithm in high dimensional feature spaces in a highly efficient manner.

The Mojave Desert eco-region extends from eastern California to northwestern Arizona, southern Nevada, and southwestern Utah. There are hundreds of endangered species. Desert tortoise (Gopherus agassizii) is one of them. It has inhabited this region for over one million years, but its population reduced dramatically during the last two decades due to both people and environment change (Humphrey 1987). It was listed as 'threatened' under the California Endangered Species Act in 1989 and the federal Endangered Species Act in 1990. Scientists have been trying to protect them from their extinction. They are interested in finding what factors impact desert tortoises and finding where desert tortoises live so they can protect the land. Mojave Desert Ecosystem Program (MDEP) is the first to organize a detailed, environmentally oriented, digital geographic database set over an entire eco-region. They have some progress on the tortoise protection. Based on some information about the tortoise habitats, MDEP researchers use ArcInfoń to select the suitable places as the tortoise habitats. After this pre-processing, the area of interest reduces a lot. As the last step, they go to these areas and see whether there are tortoises and try to protect them. This trial and error mode is time consuming, expensive and they have to try this over and over again if they are interested in species other than tortoise.

Using the index system to handle uncertainties that we have developed in this project we can find desert tortoise very efficiently ľ the area to be searched is reduced by more than 65%. This will save significant resources that are spent in the actual search that is carried out to find the endangered species.

The results of our research are applicable to other species and other sources of data where uncertainties are involved.

One student Anlei Dong completed his Ph.D. in Dec. 2004. Other graduate Ph.D. students are still working on their research. One undergraduate student (Michael Kurth) has coauthored a couple of conference papers and is a coauthor of a journal paper.


Area Background

There has been very little research in the database field that allows for uncertainties. A standard assumption for spatial databases has been that the positional attributes of spatial objects are known precisely. Today, various optical, infrared and radar sensors are repeatedly observing large regions, generating vast quantities of multi-media imagery. Image analysis algorithms are emerging that can categorize terrain features of interest (such as crops, woodlands, etc.) or objects (buildings, vehicles, and people) with varying degrees of confidence. Any of these features of interest are inherently dynamic (such as crops, rivers, sea ice, etc.) with long-term, seasonal and short-term variations. Results from computer algorithms involve some uncertainty due to variations and noise in the input, as well as inherent errors that produce false alarms. In addition, the performance of these algorithms may not degrade gracefully, leading to data existence problems. Uncertainty in map databases can arise from several sources. First, changes in the real world can cause information to become out of date, even if temporarily. Without efficient mechanisms for managing uncertainty-related metadata, it is extremely hard either to improve the quality of query responses or to manage the data quality. Second, much of the data is acquired using automated image processing techniques on satellite images. Image processing techniques produce features that have significant amounts of uncertainties.

Area References

  • K. Lowell (Editor), Spatial Accuracy Assessment: Land Information in Natural Resources, December 1999.
  • D. Barbara, H. Garcia-Molina and D. Porter, “The Management of Probabilistic Data,” IEEE Trans. on Knowledge and Data Engineering, (4)5, pp. 487-502, 1992.
  • L. Lakshmanan, N. Leone, R. Ross and V. Subrahmanian, “ProbView: A Flexible Probabilistic Database System,” ACM Trans. on Database Systems, (22)3, pp. 419-469, 1997.
  • J. Barros, J. French, W. Martin and P. Kelly, “System for Indexing Multi-Spectral Satellite Images for Efficient Content-Based Retrieval,” Proc. of the SPIE Conference on Storage and Retrieval for Image and Video Databases III, Vol. 2420, pp. 228-237, February 9-10, 1995.
  • M. Lo and C.V. Ravishankar, “Spatial hash-joins,” Proc. 1996 ACM SIGMOD International Conference on Management of Data, pp. 247-258, Montreal, Canada, June 3-6.
  • B. Bhanu, Y. Lin and K. Krawiec, “Synthesis of Pattern Recognition Systems,” Springer 2004.
  • C-S. Li and J. Turek, “Content-Based Indexing of Earth Observing Satellite Image Database with Fuzzy Attributes, Proc. SPIE, Vol. 2670, pp. 438-449, February 1996.
  • B. Bhanu and A. Dong, “Concepts learning with fuzzy clustering and relevance feedback,” Engineering Applications of Artificial Intelligence, Vol. 15, No. 2, pp. 123-138, 2002.

Potential Related Projects: We would be interested in collaborating with others who are interested in representing, modeling, and managing uncertainties in databases.

Project Website: http://www.vislab.ucr.edu/research/projects/husd/husd.html

Acknowledgement This material is based upon work supported by the National Science Foundation under ITR Grant No. 0114036. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.