COD KnOwledge and Decision
KnOwledge and Decision
The COD team is built around three major research areas:
The multidisciplinary aim of the team is improve the efficiency, in terms of complexity and also in terms of "actionability" of data mining and learning algorithms by integrating knowledge of the field and/or of the users. This integration is carried out using a coupling with knowledge models (ontologies) or using an interaction with the user by means of suitable interactive visual supports.
The evolution of data analysis in the direction of data mining at the beginning of the 1990s was marked by a change in the scale of the data handled. The central question for the precursors of data mining was to find (”to mine”) potentially useful information among ever growing masses of data. Two decades after the launch of the US manifesto for the Extraction of Knowledge from data [Frawley 92] not only has the scale of data significantly increased, but the data itself has also undergone profound changes. This new evolution is translated by an increase in the complexity of the data processed: it was no longer a series of standard recordings of relational databases, but of data whose conventional transformation Individuals × Variables was more complex. Data mining was transformed into "mining of complex data" and even, as a result of web semantics in particular, into ”knowledge mining”. The COD team’s research followed these changes, and was focused on three themes which feed on each other’s findings: data mining and relation learning, ontology engineering, and knowledge visualisation.
Data mining and rule learning
Identifying relations that link phenomena, whether they are natural, arise from human activities or from artificial systems, is the key to accessing their understanding. These relations can describe various situations from the concomitance between the existence of two phenomena, to causality, where the antecedent is the cause and the consequences the effect, often given precedence in research due to its potential predictive capacity. Our work is mainly centred on analysing asymmetrical dependence, and our research takes two directions: (1) exploratory mining of rules of association, and (2) the learning of probabilistic graphic models.
In data mining, the rules of association of the type ”if a and b are present then generally c is also present”, introduced to express the implicational tendencies between attributes in a relational table, quickly came into intensive use. The priority objective consists in extracting the ”surprising” and potentially interesting rules for the user. The obstacle here is the considerable volume of rules generated by conventional automatic algorithms which do not lend themselves easily to interpretation. To overcome this difficulty, we have adhered to two different research streams. The first stream, which is statistical, consists in defining measures-called quality measurements- which quantify the relevance of the rules and make it possible to filter them, and to structure the rules extracted by classifying them by classification rules adapted to asymmetrical data. ”Post mining” has taken over from ”data mining”. The second approach, which is more recent, finds its roots in artificial intelligence: it is aimed at filtering the rules by introducing knowledge via knowledge models, according to semi-supervised mode allowing the user to play an heuristic role in the exploration of the rule space via adapted interactive visual supports.
In learning, Bayesian networks, produced by the convergence between statistics and artificial intelligence, are probabilistic graphic models whose structure can enable direct causal relations, or the presence of latent variables to be represented. The learning of the structure of the Bayesian networks enables the discovery of new knowledge which is sometimes more useful to the expert than the model itself. The theoretical results on the asymptotic properties of these networks associated with successful results in an increasing number of varied applications obtained in the last decade have contributed to their significant growth. Our work is aimed mainly at the development of learning algorithms taking account of difficulties encountered by many applications where in particular n << p (little data as compared with the number of variables). We adhere to a similar stream to that adopted for the rules of association and which aims to guide the learning of the structure of the network by means of knowledge models.
The problem of the representation of knowledge posed by the pioneers of artificial intelligence has become a major issue in new information and communication systems. The formalisms upon which the representations are based determine both the types of knowledge that can be represented and the reasoning mechanisms that can be performed. Associated with the growth of web semantics, representations by ontologies have gained prominence in the knowledge engineering community; an ontology is often defined as a conceptualisation, according to a point of view imposed by the applications, of objects and structuring relations between these objects in a specific field. One of the major issues remains the operational construction of these ontologies, and the volume of concepts and relations considered in these ontologies has considerably increased in the last few years, from a few hundred to several thousand in the various application fields. We address this issue from two different angles. From an experimental point of view, we construct ontologies associated with real applications and we attempt to develop a methodology. From a more theoretical point of view, our work is focused both on the extension of the conventional subsumption hierarchy model by taking account in particular of the axioms (”heavyweight ontologies”) and on the analysis of the development of measurements of semantic similarities which make it possible to compare the concepts in the same ontology or in different ontologies.
The growth in knowledge visualisation, or visual analytics, presented by its current protagonists as an interdisciplinary field, finds it basis in a long tradition (e.g. Bertin’s theory of graphics, 1967 or Tukey’s exploratory data analysis, 1977) which highlights the need, for users, for a coupling between data mining and visualisation. Based on the observation that current knowledge extraction methods are not applicable in intuitive, rapid and interactive, utilisation framework, the aim is to bypass the framework of visualisation as a simple visual representation of the results of obtained by automatic algorithms: ”visual analytics is more than visualization and can rather be seen as an integrated approach combining visualization, human factors and data analysis”. It is therefore necessary to rely on recent technologies (e.g. programming languages, physical supports and effectors, programmable graphics boards) to develop new approaches to visual data exploration which includes the user in the mining process. Our research falls within this stream, and the positioning that we have chosen is that of recourse to 3D and immersive environments. These approaches, that are not yet been developed to a great extent in the data mining community relies on technologies which are now spreading rapidly.
One of our objectives is to embed the user’s preferences in the knowledge extraction phase. In this broad framework, we focus on two research directions: the modeling of the user’s preferences by means of multicriteria decision aiding techniques, and the embedding of these preferences in post mining algorithms by means of adapted interactive visual interfaces, the user’s preferences then playing the role of a heuristic.