AISC releases open source Chinese Internet Corpus CCI 4.0
On May 6, the Institute of Artificial Intelligence released a large open source text dataset, CCI 4.0, at the GOSIM Global Open Source Innovation Forum held in Paris, France. This release includes two languages, Chinese and English, and will open source versions in more languages in subsequent releases. The CCI 4.0 dataset is led by the Institute of Artificial Intelligence, and is jointly contributed by many institutions including Alibaba Cloud, Shanghai Artificial Intelligence Laboratory, Huawei, Mobvoi, Kingsoft Office, Kunlun Wanwei, Mianbi Intelligence, Qihoo Technology, Meituan, Xiyu Technology, Dark Side of the Moon, Zidong Taichu, Zhongke Wenge, iFlytek, etc.