Extracting Structured Data from Web Pages: Experiments



Collection Name Netflix Movies
Source Netflix
Number of Pages 50
Automatically Extracted Template template.xml
Automatically Extracted Schema schema.xml
Manually Deduced Schema manschema.txt
Equivalence Classes eq.cls
Source Pages and Automatically Extracted Values corresponding to pages
Page Source Extracted Value
1.html value
2.html value
3.html value
4.html value
5.html value
6.html value
7.html value
8.html value
9.html value
10.html value
11.html value
12.html value
13.html value
14.html value
15.html value
16.html value
17.html value
18.html value
19.html value
20.html value
21.html value
22.html value
23.html value
24.html value
25.html value
26.html value
27.html value
28.html value
29.html value
30.html value
31.html value
32.html value
33.html value
34.html value
35.html value
36.html value
37.html value
38.html value
39.html value
40.html value
41.html value
42.html value
43.html value
44.html value
45.html value
46.html value
47.html value
48.html value
49.html value
50.html value