ANTCorpus project
Arabic News Texts Corpus

About the project


ANTCorpus stands for "Arabic News Texts Corpus". It is a research project that aims to collect texts from different sources of the web by incrementing the amount of data progressively.

The acronym ANT can remind the ants' work:
"Every ant should contribute to build the nest progressively".



How it works ?


From RSS feeds of news websites

Filter and Extract categorized data

Generate corpus documents

The team behind the scene


Technical staff


Dr. Oussama Ben Khiroun

oussama [DOT] ben [DOT] khiroun [@] gmail [DOT] com

Dr. Raja Ayed

ayed [DOT] raja [@] gmail [DOT] com

Amina Chouigui

aminachouigui [@] gmail [DOT] com

Head staff


Dr. Bilel Elayeb

Download


Download Version 1.1 (Number of documents = 10.161 | Released version date = 11 June 2018)
Download Version 1.0 (Number of documents = 6.005 | Released version date = 12 August 2017)

Citation Licence

The files of ANT Corpus are subject to the following citation license:

By downloading ANT Corpus, you agree to cite at least one of our papers describing ANT Corpus (refer to the section below) and/or refer the project's main page in any kind of material you produce where ANT Corpus was used to conduct search or experimentation, whether be it a research paper, dissertation, article, poster, presentation, or documentation.
✅ By using this data, you have agreed to the citation licence.

Publications

📄 A. Chouigui, O. Ben Khiroun and B. Elayeb. ANT Corpus : An Arabic News Text Collection for Textual Classification. In proceedings of the 14th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 2017), pp. 135-142, Hammamet, Tunisia, October 30 - November 3, 2017.

📄 A. Chouigui, O. Ben Khiroun and B. Elayeb. A TF-IDF and Co-occurrence Based Approach for Events Extraction from Arabic News Corpus. In proceedings of the 23rd International Conference on Natural Language & Information Systems (NLDB 2018), pp. 272-280, Paris, France, 13-15 June 2018.

📄 A. Chouigui, O. Ben Khiroun and B. Elayeb. Related Terms Extraction from Arabic News Corpus using Word Embedding. In: OTM Conferences & Workshops: Proceedings of the 7th International Workshop on Methods, Evaluation, Tools and Applications for the Creation and Consumption of Structured Data for the e-Society (Meta4eS'18), Springer, LNCS, pp. 1-11, Valletta, Malta, 22-26 October 2018.