Deepseek publishes patent for breadth data collection method
2025-04-02 09:12:40

According to Tianyancha's property clue information, the patent for "a method and system for breadth data collection" applied by Hangzhou Deepseek's affiliated company, Deepseek Artificial Intelligence Basic Technology Research Co., Ltd., was recently published. The abstract shows that the present invention relates to the field of data collection, including establishing a web page meta information library; determining the daily scheduling unit download quota and the total download quota for the day; selecting a corresponding number of links from the web page meta information library and allocating download quotas; download process control; downloading texts are post-processed and data cleaned before entering the refill queue, and updating the web page meta information library through information refill. The beneficial effects of the present invention are: discovering as many web page links as possible and reducing the traffic impact on the website; analyzing the content that has been downloaded, inferring the quality of the connections that have not been downloaded, and reducing the download and repeated download of low-quality web pages by allocating quotas through preferential download, improving data quality and download efficiency, and reducing the consumption of network resources during data collection; using a separate information refill queue to ensure the atomicity and stability of the modification operation of the web page meta information library.
AI
Email Subscription
Newsletters and emails are now available! Delivered on time, every weekday, to keep you up to date with North American business news.
ASIA TECH WIRE

Grasp technology trends

Download