previous botservatories

Every day, hundreds of crawlers collect data from the websites of radio, TV, online, and print publishers. They perform monitoring, analysis, and summaries, providing the high-quality big data essential to AI.

The Allen Institute for Artificial Intelligence (AI2), LLM and open dataset for AI

The Allen Institute, as CommonCrawl, is a non-profit Organization.

It was founded in 2014 by Paul Allen, philanthropist and Microsoft co-founder, to find transformative ways to develop AI. As a non-profit AI research institute,  AI2 develops foundational AI research and innovation to deliver real-world impact through large-scale open models, data, robotics, conservation, and beyond.

Among other projects, AI2 developped LLMs such as OLMo, Tülu, provides datasets available on Hugging Face

OLMo 2, the best fully open language model to date, including a family of 7B, 13B, and 32B models trained up to 6T tokens. OLMo 2 outperforms other fully open models and competes with open-weight models like Llama 3.1 8B.

AI2 partners with Gates Foundation, University of Washington and NAIRR and 500+ academic journals on Semantic Scholar project

  1. obéir à robots.txt : on ne voit pas AI2 consulter robots.txt
  2. Stats sur Botscorner.

AI2 Olmo’s stats show a huge activity on french websites plugged on Botscorner : more than +210.000 pages / 24h.

3. Estimated revenue $35M per year / 325 employees (source: Growjo)

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *