ImagEVAL is all about assessing image processing technology needed for sorting, finding and describing still images contained in vast data bases. The assessment focuses on features relating to what collection holders expect in terms of how images may be used (as described by a panel of participants from the defence, industrial and cultural sectors).
|
Description of the tasks |
Detailed description of the task [pdf] presentation of Imageval during the image clef workshop [pdf] (alicante, 19/09/2006) Presentation meeting technovision (in french) march 2006 [pdf]
RECOGNISING TRANSFORMED IMAGES |
The invariability of indexation techniques in relation to certain geometric or chromatic transformations is a fundamental part of protecting the intellectual property of visual content and sorting multimedia flow. The aim is to find all other images in the base that are derived from the original image requested A set of N images were automatically transformed using various algorithms:
To make it more complex, combinations of transformations were used The training test is composed of about 4500 transformed images and the official database of about 45000 images (N=2500) Two tasks were organized:
Queries 50 queries have been proposed for the first task and 60 for the second. The images have been chosen to correctly represent the diversity of the database. To make the task more complex, images with visual similarities have been selected in the database. |
COMBINED TEXT/IMAGE SEARCH |
Many published images contain text (internet pages for example). Images therefore play an illustrative role and come within the scope of a semantic context that may be analysed by a linguistic study of textual data. The aim is to assess how techniques involving text and images work together in order to improve similar image searches within the framework of information searches on text/image data. As this task is “search on the web” oriented, the database has been created by extraction of pages from the Web, especially from Wikipedia for copyright reasons. The Web pages have been found using classical search engines (Google and Alltheweb) An automatic segmentation of pages into text and image is performed. For this first edition, the web pages are in French. The link between the image and its position in the text is kept. The database is composed of a list of 700 URLs and the corresponding text and images files. The participants were also invited to use personnal web page segmentation tools Using Wikipedia, we focused on more “encyclopaedic” and “picturable” topics:
The goal of the task is to find all the images answering the query composed of a key word and few images positives Queries A query is a composition of keys words (for instance: “Tour Eiffel”) and few relevant images (that did not come from the database). 25 queries have been selected: bee, avocado, tennis ball, lemon, ladybird, Ethiopan flag, European flag, Picaso’s Guernica de Picasso, la Joconde, lava flow, Delacroix “Liberté Guidant le Peuple”, Great Wall of China, Percé Rock, clown fish, Siamese cat, tennis playground, Ayers Rock , zebra, Eiffel Tower, Statue of Liberty, Niagara falls, teddy bear, screwdriver, poplar tree, map of Norway. MAP is the principal metric. |
DETECTING TEXTUAL ZONES IN AN IMAGE |
Sometimes textual data can actually be found in an image or just superposed on it. This text is an important source of information for identifying people, places or on a wider scale, the context of the image. The aim of this task is to locate, extract and then identify textual elements in an image. Because of the very high heterogeneity of the database (old postcards and news photos), this task was more difficult than expected We think that the challenge proposed by ImagEVAL was more important than for a specific campaign like ICDAR. We only focused the task on text area detection and not the recognition of those character strings For this task, the choice of metric was a very hard problem that can be seen as a classical object detection evaluation problem. The principal metric retained was proposed by Christian Wolf and Jean-Michel Jolion [LIRIS INSA/CNRS ]. A complementary metric is the more classical Precision and Recall adapted and used in ICDAR 2003 J.M. Jolion and C. Wolf proposed a metric that enables a better evaluation of the problems of over or under segmentation. The main idea is to consider different type of matching between ground truth bounding boxes and participants bounding boxes: one-to-one, one-to-many and many-to-one as presented in the following figure:
See the DetEVAL web pages with information and available tools for this metric The database is composed of 500 images. An image contains a legend (postcard) or a text area that is a part of the scene.
|
DETECTING OBJECTS |
Identifying objects in any sort of image is at the heart of most technological breakthroughs concerning image processing as the technique can be used in all areas of application where image processing is involved (defence, civil, sales, surveillance etc.). In this task, the participant must recognise from a photographic base photos that represent the object given. This will be done by modelling the object using the image base representing it. This task involves object detection using a limited learning database Ten objects or classes of objects are considered:
The main objective of the task is to evaluate the detection capabilities of a system to detect a particular object (American flag) or a class of object (car) using limited learning data. Thus, the dictionary database is composed of about 100 images per entity. The participants are forced to use only these data for the first run but they can use complementary data (coming from Internet or personal image collections) for other runs. The learning database is composed of 743 images:
The database is composed of 14000 images with high but realistic complexity (scales, compositions, occlusions, etc.) mainly coming from the HACHETTE Photos database [see]. Each image contains one, several or no object. About 5000 images without any of the ten objects also composed the database.
|
RECOGNISING ATTRIBUTES |
The automatic classification of images is an important challenge. Analysing the image enables certain, very interesting attributes to be recognised in order to sort or classify them in a totally automatic way. In this task, participants must identify all images containing certain attributes. These attributes correspond to the very nature of the image as well as its context and make-up. This task can be seen as a classification or an automatic annotation task. The semantics to be extracted from unannotated images involve the nature and the context of the image. Ten attributes are considered:
The attributes are organized in a shallow semantic tree:
The first run must return results obtained from using only the given learning images, but other runs may use any supplementary data. The queries are an attribute or a series of attributes. MAP and Recall and Precision features are used for the evaluation. The learning database is composed of 5474 images:
A groundtruth file was attached containing the membership of each image to the different attribute. Queries 13 queries have been proposed :
|
PROTOCOLS AND ORGANISATION |
ImagEVAL is organized in 3 steps:
The main metric (for ranking) is the MEAN AVERAGE PRECISION For each suggested task in ImagEVAL, the protocol, organisation and listing of the different files are explained in the document ImagEVAL_info_tasks_eng.pdf
|
| Site map | Home | Participants | Campaigns | Communications | Contact us | |