Home > Teams > COD

COD

KnOwledge and Decision

Team leader

Administrative assistant

Elodie GUIDON
Phone +33(0)2 51 12 58 70

Team Presentation

The COD team is built around three major research areas:

  • data mining (mining for unsupervised association and classification rules) and learning (probabilistic graphic models),
  • knowledge engineering,
  • knowledge visualisation.

The multidisciplinary aim of the team is improve the efficiency, in terms of complexity and also in terms of "actionability" of data mining and learning algorithms by integrating knowledge of the field and/or of the users. This integration is carried out using a coupling with knowledge models (ontologies) or using an interaction with the user by means of suitable interactive visual supports.

Research directions

The evolution of data analysis in the direction of data mining at the beginning of the 1990s was marked by a change in the scale of the data handled. The central question for the precursors of data mining was to find (”to mine”) potentially useful information among ever growing masses of data. Two decades after the launch of the US manifesto for the Extraction of Knowledge from data [Frawley 92] not only has the scale of data significantly increased, but the data itself has also undergone profound changes. This new evolution is translated by an increase in the complexity of the data processed: it was no longer a series of standard recordings of relational databases, but of data whose conventional transformation Individuals × Variables was more complex. Data mining was transformed into "mining of complex data" and even, as a result of web semantics in particular, into ”knowledge mining”. The COD team’s research followed these changes, and was focused on three themes which feed on each other’s findings: data mining and relation learning, ontology engineering, and knowledge visualisation.


Data mining and rule learning

Identifying relations that link phenomena, whether they are natural, arise from human activities or from artificial systems, is the key to accessing their understanding. These relations can describe various situations from the concomitance between the existence of two phenomena, to causality, where the antecedent is the cause and the consequences the effect, often given precedence in research due to its potential predictive capacity. Our work is mainly centred on analysing asymmetrical dependence, and our research takes two directions: (1) exploratory mining of rules of association, and (2) the learning of probabilistic graphic models.

In data mining, the rules of association of the type ”if a and b are present then generally c is also present”, introduced to express the implicational tendencies between attributes in a relational table, quickly came into intensive use. The priority objective consists in extracting the ”surprising” and potentially interesting rules for the user. The obstacle here is the considerable volume of rules generated by conventional automatic algorithms which do not lend themselves easily to interpretation. To overcome this difficulty, we have adhered to two different research streams. The first stream, which is statistical, consists in defining measures-called quality measurements- which quantify the relevance of the rules and make it possible to filter them, and to structure the rules extracted by classifying them by classification rules adapted to asymmetrical data. ”Post mining” has taken over from ”data mining”. The second approach, which is more recent, finds its roots in artificial intelligence: it is aimed at filtering the rules by introducing knowledge via knowledge models, according to semi-supervised mode allowing the user to play an heuristic role in the exploration of the rule space via adapted interactive visual supports.

In learning, Bayesian networks, produced by the convergence between statistics and artificial intelligence, are probabilistic graphic models whose structure can enable direct causal relations, or the presence of latent variables to be represented. The learning of the structure of the Bayesian networks enables the discovery of new knowledge which is sometimes more useful to the expert than the model itself. The theoretical results on the asymptotic properties of these networks associated with successful results in an increasing number of varied applications obtained in the last decade have contributed to their significant growth. Our work is aimed mainly at the development of learning algorithms taking account of difficulties encountered by many applications where in particular n << p (little data as compared with the number of variables). We adhere to a similar stream to that adopted for the rules of association and which aims to guide the learning of the structure of the network by means of knowledge models.

Ontology engineering

The problem of the representation of knowledge posed by the pioneers of artificial intelligence has become a major issue in new information and communication systems. The formalisms upon which the representations are based determine both the types of knowledge that can be represented and the reasoning mechanisms that can be performed. Associated with the growth of web semantics, representations by ontologies have gained prominence in the knowledge engineering community; an ontology is often defined as a conceptualisation, according to a point of view imposed by the applications, of objects and structuring relations between these objects in a specific field. One of the major issues remains the operational construction of these ontologies, and the volume of concepts and relations considered in these ontologies has considerably increased in the last few years, from a few hundred to several thousand in the various application fields. We address this issue from two different angles. From an experimental point of view, we construct ontologies associated with real applications and we attempt to develop a methodology. From a more theoretical point of view, our work is focused both on the extension of the conventional subsumption hierarchy model by taking account in particular of the axioms (”heavyweight ontologies”) and on the analysis of the development of measurements of semantic similarities which make it possible to compare the concepts in the same ontology or in different ontologies.


Visual analytics

The growth in knowledge visualisation, or visual analytics, presented by its current protagonists as an interdisciplinary field, finds it basis in a long tradition (e.g. Bertin’s theory of graphics, 1967 or Tukey’s exploratory data analysis, 1977) which highlights the need, for users, for a coupling between data mining and visualisation. Based on the observation that current knowledge extraction methods are not applicable in intuitive, rapid and interactive, utilisation framework, the aim is to bypass the framework of visualisation as a simple visual representation of the results of obtained by automatic algorithms: ”visual analytics is more than visualization and can rather be seen as an integrated approach combining visualization, human factors and data analysis”. It is therefore necessary to rely on recent technologies (e.g. programming languages, physical supports and effectors, programmable graphics boards) to develop new approaches to visual data exploration which includes the user in the mining process. Our research falls within this stream, and the positioning that we have chosen is that of recourse to 3D and immersive environments. These approaches, that are not yet been developed to a great extent in the data mining community relies on technologies which are now spreading rapidly.

Embedding preferences

One of our objectives is to embed the user’s preferences in the knowledge extraction phase. In this broad framework, we focus on two research directions: the modeling of the user’s preferences by means of multicriteria decision aiding techniques, and the embedding of these preferences in post mining algorithms by means of adapted interactive visual interfaces, the user’s preferences then playing the role of a heuristic.

  • Multicriteria decision aiding.
    The aim is here to take into account different and often contradictory points of view. The context being that of multiattribut utility theory, each alternative is to be synthesized by means of a global score resulting from the aggregation of its partial scores along each point of view. The comparison of two alternatives is then equivalent to a comparison of their global scores.
  • Interactive visualization.
    This approach consists in embedding the subjective experience of the user in the post mining process that here takes the form of a navigation in a virtual space. In the framework of the mining of association rules, we are interested in formal representations that can be used to define the navigation space (\u201crule space\u201d), to the corresponding visual representations and to the implementation of the interactive rule mining process.

Last update : Saturday 8 March 2014