FA Database

Annotated database of on-line sketched
diagrams from Finite Automata domain.

The Database

fa_diagram This is a database of on-line sketched diagrams from the finite automata (also known as finite state machine) domain. The purpose of the database is to provide a benchmark to algorithms for diagram recognition. It contains 300 diagrams drawn by 25 users. Each user was asked to draw 12 diagrams according to the given patterns. The data was collected with standard tablet PC Lenovo X61. The data are stored in InkML standard. Individual strokes contain information about position of the pen tip, time, and pressure. Annotation of symbols and relations among them is provided. Additionally, there is a meaning of the text assigned to text blocks and arrows have defined connection points.

Download

The database is divided into train dataset (11 users, 132 diagrams), validation dataset (7 users, 84 diagrams) and test dataset (7 users, 84 diagrams). It is free to download.

file: FA_database_1.1.zip
changelog:

  • Arrows contain annotation of their heads and shafts. If a head and a shaft were drawn by one stroke a proposed split point is annotated.

We also provide instances of the max-sum problem (a.k.a. discrete energy minimization/maximization) corresponding to the diagrams from the database. The instances were obtained using the recognition system described in the paper [2]. The instances are saved in a data format used by Troulbar2 (folder format_1) and in a format used by energy minimization community (folder format_2). We believe that these instances might be interesting for people doing research in optimization.

max-sum instances: FA_maxsum_1.1.zip

Parsing the Database

The database is stored in InkML format and thus it should be easy to parse. However, we describe how are symbols, connection points of arrows and relations annotated. The list of symbols in the database is following:

  • state
  • final state
  • arrow
  • label

Symbols are annotated in the following way. Let us note that annotation textMeaning is defined for labels only and the annotation connectionPointFrom with connectionPointTo are defined for arrows only. The elements head and shaft are defined for arrows only as well. These two elements contain a list of strokes, which is a subset of strokes of the whole symbol. The attributes from and to are defined only in the case that a stroke was split and just its one part is referenced. <traceGroup id="[id]">
  <annotation type="truth">[symbol_name]</annotation>
  <annotation type="textMeaning">[text_meaning]</annotation>
  <annotation type="connectionPointFrom">[relative_stroke_id] [point_id]</annotation>
  <annotation type="connectionPointTo">[relative_stroke_id] [point_id]</annotation>
  <traceView traceDataRef="[stroke_1_id]" />
  <traceView traceDataRef="[stroke_2_id]" />
  .
  .
  .
  <head>
   <traceView traceDataRef="[head_stroke_1_id]" from="[point_id]" to="[point_id]" />
   .
   .
   .
  </head>
  <shaft>
   <traceView traceDataRef="[shaft_stroke_1_id]" from="[point_id]" to="[point_id]" />
   .
   .
   .
  </shaft>
</traceGroup>
Relations are binary (initial arrow enrtering a state or a labeling of a symbol) or ternary (arrow connecting two symbols). In the case the relation is binary the third reference to a symbol is not defined and the first reference is always text (in the case of labeling) or arrow (in the case of arrow entering an initial state). In the case of ternary relation the first reference is the arrow, second reference is a symbol where is the arrow coming out and the second reference is the symbol the arrow is coming in. The list of relations in the database:

  • arrow_in
  • arrow_connection
  • text_inside
  • arrow_label

<symbolGroup id="[id]" arity="[arity_num]">
  <annotation type="truth">[relation_name]</annotation>
  <symbolView symbolDataRef="[symbol_1_id]" />
  <symbolView symbolDataRef="[symbol_2_id]" />
  <symbolView symbolDataRef="[symbol_3_id]" />
</symbolGroup>
Let us note that initial arrows are annotated the same way as arrows connecting two symbols. However, they might be easily found using the information about the reltionships.

References

If you use this database in your research, please, cite the paper [1].

[1] Bresler M., Van Phan T., Průša D., Nakagawa M., Hlaváč V.: Recognition system for on-line sketched diagrams. In: J.E. Guerrero (ed.) ICFHR 2014: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, pp. 563-568. IEEE Computer Society, 10662 Los Vaqueros Circle, Los Alamitos, USA (2014). DOI 10.1109/ICFHR.2014.100 [pdf] [bib]
[2] Bresler M., Průša D., Hlaváč V..: On-line recognition of sketched arrow-connected diagrams. International Journal on Document Analysis and Recognition (IJDAR). DOI 10.1007/s10032-016-0269-z. To be published.