Sources of data

Linked data

RDF vocabularies and ontologies used

The format of data is the Turtle serialization of RDF (GZipped).

Tasks

Task A1

Training dataset

  • Size of data (Gzipped): 61 MB
  • Number of RDF triples: 6 223 018
  • Number of instances of pc:Contract: 70 543
  • Number of links to DBpedia: 41 468
  • Number of links to OpenCorporates: 14 894

Testing dataset

  • Size of data (Gzipped): 1.2 MB
  • Number of RDF triples: 67 007
  • Number of instances of pc:Contract: 788
  • Number of links to DBpedia: 177
  • Number of links to OpenCorporates: 0 (testing dataset does not contain information about bidders)

Format of results

The results for the task will be delivered in CSV format with two columns. First column will contain URI of an annotated public contract (instance of pc:Contract), second column will contain the predicted number of tenders for the public contract in the format of positive integer.

Example
contract,numberOfTenders
"http://linked.opendata.cz/resource/domain/fbo.gov/contract/AG-02NV-S-14-7000",13

Task A2

Same as the training dataset for the task A1.

License information

Data extracted from USASpending.gov, FedBizOpps.gov and FAR Codes is public domain. The datasets include data from DBpedia, OpenCorporates and North American Industry Classification System (NAICS). Data from DBpedia is dual-licensed under the Creative Commons Attribution-Share Alike (CC-BY-SA) and the GNU Free Documentation Licence. Data from OpenCorporates.com is provided with the Open Database License, which is also the license used for the RDF version of NAICS.

Grundzertifikat Beruf und Familie