Skip to main content
Log in

Codifying collaborative knowledge: using Wikipedia as a basis for automated ontology learning

  • Article
  • Published:
Knowledge Management Research & Practice

Abstract

In the context of knowledge management, ontology construction can be considered as a part of capturing of the body of knowledge of a particular problem domain. Traditionally, ontology construction assumes a tedious codification of the domain experts knowledge. In this paper, we describe a new approach to ontology engineering that has the potential of bridging the dichotomy between codification and collaboration turning to Web 2.0 technology. We propose to shift the primary source of ontology knowledge from the expert to socially emergent bodies of knowledge such as Wikipedia. Using Wikipedia as an example, we demonstrate how core terms and relationships of a domain ontology can be distilled from this socially constructed source. As an illustration, we describe how our approach achieved over 90% conceptual coverage compared with Gold standard hand-crafted ontologies, such as Cyc. What emerges is not a folksonomy, but rather a formal ontology that has nonetheless found its roots in social knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

Notes

  1. http://www.wikipedia.org

References

  • Adamides E and Karacapilidis N (2006) Information technology support for the knowledge and social processes of innovation management. Technovation 26 (1), 50–59.

    Article  Google Scholar 

  • Balconi M (2002) Tacitness, codification of technological knowledge and the organization of industry. Research Policy 31 (3), 357–379.

    Article  Google Scholar 

  • Balconi M, Pozzali A and Viale R (2007) The ‘codification debate’ revisited: a conceptual framework to analyze the role of tacit knowledge in economics. Industrial and Corporate Change 16 (5), 823–849.

    Article  Google Scholar 

  • Bertino E, Catania B and Zarri GP (2001) Intelligent Database Systems. Addison-Wesley Longman Publishing Co., Inc. Boston, MA.

    Google Scholar 

  • Bowker G and Star L (1999) Sorting Things Out: Classification and Its Consequences. MIT Press, Cambridge, MA.

    Google Scholar 

  • Buchholz W (2006) Ontology. In Encyclopaedia of Knowledge Management (Schwartz DG Ed), pp 694–702, IGI Reference, Idea Group Inc., Hershey, PA.

    Chapter  Google Scholar 

  • Buitelaar P, Cimiano P and Magnini B (2005) Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, Amsterdam.

    Google Scholar 

  • Burstein F, Mckemmish SM, Fisher JL, Manaszewicz R and Malhotra P (2006) A role for information portals as intelligent decision support systems: Breast Cancer Knowledge Online experience. In Intelligent Decision-making Support Systems: Foundations, Applications and Challenges (GUPTA JND, FORGIONNE GA and MORA M, Eds), pp 359–383, Springer-Verlag, London, UK.

    Chapter  Google Scholar 

  • Cimiano P (2006) Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer-Verlag New York Inc., Secaucus, NJ.

    Google Scholar 

  • Cimiano P, Handschuh S and Staab S (2004) Towards the self-annotating web. In Proceedings of the 13th International Conference on World Wide Web, May 17–20, pp 462–471, ACM, New York, NY.

    Google Scholar 

  • Cross R, Parker A, Prusak L and Borgatti S (2001) Knowing what we know: supporting knowledge creation and sharing in social networks. Organ Dynamics 3 (2), 100–120.

    Article  Google Scholar 

  • De Bo J, Spyns P and Meersman R (2003) Creating a ‘dogmatic’ multilingual ontology infrastructure to support a semantic portal, in on the move to meaningful Internet systems 2003: OTM 2003 workshops. Lecture Notes in Computer Science 2889, 253–266.

    Article  Google Scholar 

  • Etzioni O, Cafarella M, Downey D, Kok S, Popescu AM, Shaked T, Soderland S, Weld DS and Yates A (2004) Web-scale information extraction in know it all: (preliminary results). In Proceedings of the 13th international conference on World Wide Web, May 17–20, pp 100–110, ACM, New York, NY.

    Google Scholar 

  • Farquhar A, Fikes R and Rice J (1997) Ontolingua server: A tool for collaborative ontology construction. International Journal of Human–Computers Studies 46 (6), 707–727.

    Article  Google Scholar 

  • Farquhar A, Fikes R, Pratt W and Rice J (1995) Collaborative ontology constructions for information integration. Technical Report, KSL-95–63, Stanford University Knowledge Systems Laboratory, Stanford University, Palo Alto, CA.

  • Fellbaum C (1998) WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.

    Google Scholar 

  • Ferneley E, Berney B and Rezgui Y (2002) Information retrieval algorithms for knowledge management – the challenge continues. In: Proceedings of the European Conference on Information and Communciation Technology Advances and Innovation in the Knowledge Society, eSMART 2002 in collaboration with CISEMIC 2002 Conference, Salford, Vol. 1, pp. 168-177.

    Google Scholar 

  • Giles J (2005) Special report: Internet encyclopedias go head to head. Nature 438 (15), 900–901.

    Article  Google Scholar 

  • Gillmor D (2004) We the Media. Sebastopol, CA: O’Reilly Media http://www.authorama.com/book/we-the-media.html.

  • Glaser M (2006) Your guide to citizen journalism. Public Broadcasting Service http://www.pbs.org/mediashift/2006/09/your-guide-to-citizen-journalism270.html.

  • Gómez-Pérez A, Fernández-López M and Corcho O (2004) Ontological Engineering: With Examples from the Areas of Knowledge Management, E-Commerce and the Semantic Web. Springer, London, UK.

    Google Scholar 

  • Guarino N and Welty C (2000) A formal ontology of properties. In Proceedings of EKAW-2000: The 12th International Conference on Knowledge Engineering and Knowledge Management, Vol. 1937, pp 97–112, Springer-Verlag, London, UK.

    Google Scholar 

  • Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th Conference on Computational Linguistics Vol. 2, Nantes, France, pp 539–545, Association for Computational Linguistics, Morriston NJ.

    Chapter  Google Scholar 

  • Holsapple CW and Joshi KD (2002) A collaborative approach to ontology design. Communications of the ACM 45 (2), 42–47.

    Article  Google Scholar 

  • Jarrar M and Meersman R (2008) Ontology Engineering – The DOGMA approach Lecture Notes In Computer Science archive. Advances in Web Semantics I: Ontologies, Web Services and Applied Semantic Web Section: Part I Ontologies and Knowledge Sharing, pp 7–34, Springer-Verlag; Berlin, Heidelberg.

    Google Scholar 

  • Jarrar M, Verlinden R and Meersman R (2003) Ontology-based customer complaint management. In Proceedings of the Workshop on Regulatory Ontologies and the Modeling of Complaint Regulations, LNCS, 2889, pp 594–606.

  • Johnson B, Edward LE and Lundvall B-Å (2002) Why all this fuss about codified and tacit knowledge? Industrial and Corporate Change 11 (2), 245–262.

    Article  Google Scholar 

  • Latour B (1987) Science in Action: How to Follow Scientists and Engineers through Society. Open University Press, Milton Keynes, UK.

    Google Scholar 

  • Lauser B, Wildermann T, Poulos A, Fisseha F, Keizer J and Katz S (2002) A comprehensive framework for building multilingual domain ontologies: Creating a prototype biosecurity ontology. International Conference on Dublin Core and Metadata Application Archive. In Proceedings of the International Conference on Dublin Core and Metadata for e-Communities: Supporting diversity and convergence table of contents, pp 113–123, Dublin Core Metadata Initiative, Florence, Italy.

    Google Scholar 

  • Lave J and Wenger E (1991) Situated Learning: Legitimate Peripheral Participation. Cambridge University Press, Cambridge.

    Book  Google Scholar 

  • Lenat DB and Guha RV (1990) Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley, Longman Publishing Co., Inc. Boston, MA.

    Google Scholar 

  • Markert K, Nissim MK and Modjeska NN (2003) Using the web for nominal anaphora resolution. Proceedings of the European Chapter of the ACL (EACL) Workshop on the Computational Treatment of Anaphora (DALE R, VAN DEEMTER K and MITKOV R, Eds), April 12–17 Budapest, Hungary pp 39–46.

    Google Scholar 

  • Mathes A (2004) Folksonomies – cooperative classification and communication through shared metadata. Computer Mediated Communication (LIS590CMC). University of Illinois, Urbana-Champaign, Illinois.

    Google Scholar 

  • Mika P (2005) Ontologies are us: a unified model of social networks and semantics. In Proceedings of the International Semantic Web Conference 2005 (ISWC 2005) Lecture Notes in Computer Science (LNCS) 3729, pp 522–536, Springer-Verlag, Galway, Ireland.

    Google Scholar 

  • Niles I and Pease A (2001) Origins of the IEEE standard upper ontology. Working Notes of the IJCAI-2001 Workshop on the IEEE Standard Upper Ontology, pp 37–42, Seattle, WA.

    Google Scholar 

  • Pinto HS and Martins JP (2004) Ontologies: how can they be built? Knowledge and Information Systems 6 (4), 441–464.

    Article  Google Scholar 

  • Pinto HS, Staab S and Tempich C (2004) DILIGENT: towards a fine-grained methodology for distributed, loosely-controlled and evolving engineering of ontologies. Proceedings of the 16th European Conference on Artificial Intelligence (ECAI) In (DE MANTRAS RL and SAITTA L, Eds), pp 393–397, IOS Press, Valencia, Spain.

    Google Scholar 

  • Ponzetto SP and Strube M (2007) Deriving a large scale taxonomy from Wikipedia. In Proceedings of the 22nd National Conference on Artificial Intelligence pp 1440–1445, Vancouver, Canada.

    Google Scholar 

  • Ratsch E, Schultz J, Saric J, Lavin PC, Wittig U, Reyle U and Rojas I (2003) Developing a protein interactions ontology. Comparative and Functional Genomics 4 (1), 85–89.

    Article  Google Scholar 

  • Reinberger ML and Spyns P (2005) Unsupervised text mining for the learning of DOGMA-inspired ontologies. Ontology Learning from Text: Methods, Evaluation and Applications and Evaluation. In (BUITELAAR P, CIMIANO P and MAGNINI B, Eds), pp. 29–43, IOS Press, Amsterdam.

    Google Scholar 

  • Sabou M (2005) Learning Web service ontologies: An automatic extraction method and its evaluation. In Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Application Series (BUITELAAR P, CIMIANO P and MAGNINI B, Eds), pp 125–139, Vol. 123, IOS Press, Amsterdam.

    Google Scholar 

  • Sabou M (2006) Building Web service ontologies. p 187, PhD thesis, SIKS Dissertation Series, UK.

  • Singh P, Lin T, Mueller E, Lim G, Perkins T and Zhu W (2002) Open mind common sense: knowledge acquisition from the general public. In Proceedings of the First International Conference on Ontologies, Databases, and Applications of Semantics for Large Scale Information Systems, LNCS 2519, pp 1223–1237, Springer-Verlag, London, UK.

    Google Scholar 

  • Snow R, Jurafsky D and Ng AY (2006) Semantic taxonomy induction from heterogenous evidence. In ACL ’06: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp 801–808, Association for Computational Linguistics, Morristown, NJ.

    Chapter  Google Scholar 

  • Star L (Ed) (1995) Ecologies of Knowledge: Work and Politics in Science and Technology. SUNY Press, Albany, NY.

    Google Scholar 

  • Star L and Griesemer J (1989) Institutional ecology, ‘translations’ and boundary objects: amateurs and professionals in Berkeley's museum of vertebrate Zoology, 1907–39. Social Studies of Science 19 (3), 387–420.

    Article  Google Scholar 

  • Suchanek FM, Ifrim G and Weikum G (2006) LEILA: learning to extract information by linguistic analysis. In Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge – OLP 2006, Sydney, Australia, July 2006, Association for Computational Linguistics, pp 18–25.

  • Suchanek FM, Kasneci G and Weikum G (2007) Yago: A core of Semantic Knowledge. In WWW’07: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706, New York, NY, ACM Press.

    Chapter  Google Scholar 

  • Sure Y (2003) Methodology, tools and case studies for ontology-based knowledge management. Unpublished Doctoral Dissertation, Karlsruhe University, Germany.

  • Sure Y, Erdmann M, Angele J, Staab S, Studer R and Wenke D (2002) Ontoedit: Collaborative ontology development for the semantic Web. In Proceedings of the 1st International SemanticWeb Conference (ISWC2002), June 9–12, 2002, LNCS 2342, pp 221–235 Springer, Sardinia, Italia.

    Google Scholar 

  • Udell J (2004) Collaborative knowledge gardening. InfoWorld. http://www.infoworld.com/article/04/08/20/34OPstrategic_1.html (accessed 24 June 2009).

  • Uschold M (1996) Building ontologies: towards a unified methodology. 16th Annual Technical Conference of the British Computer Society Specialist Group on Expert Systems, pp 75–90, SGES Publications, Cambridge, UK.

    Google Scholar 

  • Vander Wal T (2004) Folksonomy, http://vanderwal.net/folksonomy.html (accessed 24 June 2009).

  • Von Ahn L (2006) Games with a purpose. Computer 39 (6), 92–94.

    Article  Google Scholar 

  • Zirn C, Nastase V and Strube M (2008) Distinguishing between instances and classes in the Wikipedia taxonomy. In Proceedings of the 5th European Semantic Web Conference (HAUSWIRTH, M KOUBARAKIS M and BECHHOFER S, Eds), LNCS, berlin, Heidelberg, June 2008 Springer Verlag.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David G Schwartz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, T., Schwartz, D., Burstein, F. et al. Codifying collaborative knowledge: using Wikipedia as a basis for automated ontology learning. Knowl Manage Res Pract 7, 206–217 (2009). https://doi.org/10.1057/kmrp.2009.14

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/kmrp.2009.14

Keywords

Navigation