Technology
Danish Kapoor
Danish Kapoor

Wikimedia transformed Wikidata data for artificial intelligence

The Wikimedia Foundation has launched a significant transformation to make millions of items in Wikidata more suitable for the use of artificial intelligence developers. The Wikimedia German team in Berlin has removed this gigantic database from its raw state and turned it into a vector format that reflects the context. Great language models will no longer only reach information, but also see the relationship between the meaning of knowledge. This will make Wikidata not only an archive, but a significant source of information for artificial intelligence.

Today, Wikidata, which hosts more than 19 million items, is a structure that has grown with the contributions of volunteers for years. Although the wealth of content is indisputable, it was often laborious to process data by machines. This new approach makes it possible for developers to use data more faster and more efficiently. In addition, small -scale teams will be able to benefit from this convenience as much as large companies. In this way, the inequalities in access to information will be partially balanced.

Wikimedia offers a more fair Wikidata access for small developers

Lydia Pinscher, Wikidata Portfolio leader, says that the new database will give an advantage especially for small artificial intelligence initiatives. Giants such as OpenAI or Anthropic can already create similar possibilities within their own structure. But it is not easy to say the same for small developer teams. This initiative will allow them to benefit from Wikidata’s wealth with full capacity. In addition, a more balanced competitive environment will be born in the field of artificial intelligence.

The main remarkable part of this transformation is that it offers a chance to cover up the unseen issues on the Internet. Most artificial intelligence models focus on popular content, while niche heads remain in the background. However, the deep structure of Wikidata allows much more various topics to stand out. In this way, the diversity of knowledge is increasing and new opportunities arise for research in different fields. This may benefit especially for education, culture and public services.

The example of the Govirectory is one of the concrete reflections of this potential. This platform makes the contact information of public officials around the world with the data compiled by volunteers. Data from Wikidata make great contributions in terms of transparency and access. In this way, citizens can access information from a single point without the need for different resources to reach their managers. In addition to all these, this model is also a source of inspiration for other projects.

From a technical point of view, the process contains remarkable details. Developed by Jina AI, the model transformed the data into vectors by making sense of contextual meaning. Datastax, a subsidiary of IBM, provided free infrastructure to store this data. Although the current data set covers the content until September 18, 2024, this does not damage the validity of the database. The overall binding effect of small changes is limited and the usability of the data is not impaired.

The Wikimedia team carefully awaits developers’ feedback. According to these comments, the database will become more rich in future updates. Thus, the new data added in the last year will be included in the vector system. However, even in the current state, the source offers a strong infrastructure. In particular, the prominence of context knowledge creates a great advantage in terms of artificial intelligence.

The most striking aspect of this development is the possibility of artificial intelligence practices transforming the way of access to information. It will no longer find the data, but also to make sense of the data in the correct context. This will provide more reliable results for both developers and users. However, the success of this process will be directly related to how developers will use the system.

In spite of everything, the biggest value of the initiative is that it paves the way for more fair access to open information. The decrease in the differences between large companies and small teams will create diversity in the ecosystem. In addition, Wikidata, which is constantly updated with the contributions of volunteers, will continue to be a sustainable source of information. On the other hand, this approach will expand the scope of future information -based projects.

Looking at the future, Wikidata’s vector -based state is expected to have more inclusive solutions in the artificial intelligence world. This is an important development not only in technological terms, but also in terms of democratization of access to information. However, this opportunity for small developers can contribute to the establishment of a more equal digital information environment. In addition, users will be able to reach more diverse and reliable content.


Danish Kapoor