FC Database

Annotated database of on-line sketched
diagrams from FlowChart domain.

The Database

fc_diagram This is a database of on-line sketched diagrams from the flowchart domain. The purpose of the database is to provide a benchmark to algorithms for diagram recognition. It contains 672 diagrams drawn by 24 users. Each user was asked to draw 28 diagrams according to the given patterns. The data was collected with standard tablet PC Lenovo X61. The data are stored in InkML standard. Individual strokes contain information about position of the pen tip, time, and pressure. Annotation of symbols and relations among them is provided. Additionally, there is a meaning of the text assigned to text blocks and arrows have defined connection points, a head, and a shaft.

Download

The database is divided into train dataset (10 users, 280 diagrams), validation dataset (7 users, 196 diagrams) and test dataset (7 users, 196 diagrams). It is free to download.

file: FC_database_1.0.zip

We also provide instances of the max-sum problem corresponding to the diagrams from the database. The instances were obtained using the recognition system described in the paper [1]. The instances are saved in a data format used by Troulbar2 (folder format_1) and in a format used by energy minimization community (folder format_2). We believe that these instances might be interesting for people doing research in optimization.

max-sum instances: FC_maxsum_1.0.zip

Parsing the Database

The database is stored in InkML format and thus it should be easy to parse. However, we describe how are symbols, connection points of arrows and relations annotated. The list of symbols in the database is following:

  • arrow
  • connection
  • data
  • decision
  • process
  • terminator
  • text

Symbols are annotated in the following way. Let us note that annotation textMeaning is defined for text only and the annotation connectionPointFrom with connectionPointTo are defined for arrows only. The elements head and shaft are defined for arrows only as well. These two elements contain a list of strokes, which is a subset of strokes of the whole symbol. The attributes from and to are defined only in the case that a stroke was split and just its one part is referenced. <traceGroup id="[id]">
  <annotation type="truth">[symbol_name]</annotation>
  <annotation type="textMeaning">[text_meaning]</annotation>
  <annotation type="connectionPointFrom">[relative_stroke_id] [point_id]</annotation>
  <annotation type="connectionPointTo">[relative_stroke_id] [point_id]</annotation>
  <traceView traceDataRef="[stroke_1_id]" />
  <traceView traceDataRef="[stroke_2_id]" />
  .
  .
  .
  <head>
   <traceView traceDataRef="[head_stroke_1_id]" from="[point_id]" to="[point_id]" />
   .
   .
   .
  </head>
  <shaft>
   <traceView traceDataRef="[shaft_stroke_1_id]" from="[point_id]" to="[point_id]" />
   .
   .
   .
  </shaft>
</traceGroup>
Relations are binary (text labeling a symbol) or ternary (arrow connecting two symbols). In the case the relation is binary the third reference to a symbol is not defined and the first reference is always text. In the case of ternary relation the first reference is the arrow, second reference is a symbol where is the arrow coming out and the second reference is the symbol the arrow is coming in. The list of relations in the database:

  • arrow_connection
  • text_inside
  • arrow_label

<symbolGroup id="[id]" arity="[arity_num]">
  <annotation type="truth">[relation_name]</annotation>
  <symbolView symbolDataRef="[symbol_1_id]" />
  <symbolView symbolDataRef="[symbol_2_id]" />
  <symbolView symbolDataRef="[symbol_3_id]" />
</symbolGroup>

References

If you need to refer to the database, please, us the following paper:

[1] Bresler M., Průša D., Hlaváč V.: On-line recognition of sketched arrow-connected diagrams. International Journal on Document Analysis and Recognition (IJDAR). DOI 10.1007/s10032-016-0269-z. To be published.