Extracting Structured Data from Web Pages: Experiments



Collection Name ATP Tennis Players' Profiles
Source Australian Open
Number of Pages 32
Automatically Extracted Template template.xml
Automatically Extracted Schema schema.xml
Manually Deduced Schema manschema.txt
Equivalence Classes eq.cls
Source Pages and Automatically Extracted Values corresponding to pages
Page Source Extracted Value
1.html value
2.html value
3.html value
4.html value
5.html value
6.html value
7.html value
8.html value
9.html value
10.html value
11.html value
12.html value
13.html value
14.html value
15.html value
16.html value
17.html value
18.html value
19.html value
20.html value
21.html value
22.html value
23.html value
24.html value
25.html value
26.html value
27.html value
28.html value
29.html value
30.html value
31.html value
32.html value