Categorizing Wikipedia Articles - Upwork

Biudžetas 501-1000 Eur
Sukurta: 2019-02-27
Baigiasi: 2019-03-06
Siūlo: Nėra
Apibūdinimas: Greetings. I have accumulated a collection of around ~40,000 random uncategorized Wikipedia article's URLs.

I'd like to sort these URLs and assign them to its respected category.

I have established some general parent-categories which I feel the articles should fall under.

Architecture, Arts, Film and Music

Communication, Education and Literature

Companies and Organizations

Economics and Finance

Energy and Environment

Food and Drink

Geography and Places

Health and Medicine

Law and Politics


Media (Books, Movies and TV)


Philosophy, Religion and Spirituality


Recreation and Sports

Science and Technology

Social Science (Anthropology, History and Sociology)

These are just the parent categories; each article should then be sorted by its sub-categories as well (Example: in Mathematics - Probability, Geometry, etc.; in Geography - Cities, National Parks, Islands, etc.; in Religion - Buddhism, Judaism, etc.; in Technology - Networking, AI, etc.; in People - Business, Sports, Politics, etc.)

The URLs are in a plain text format (.txt) and the output can be the same.






[Communication - Journalism]

[Communication - Literature]

[Companies - Financial]

[Companies - Media]

[Companies - Technology]

[Companies - Transport]

[Finance - Foreign Exchange]

[Finance - Insurance]

[Finance - Options]

[Finance - Taxation]

[Health - Diseases]

[Health - Sleep]

[Geography - Parks]

[Geography - Salt Flats]

[Geography - Valleys]

[People - Artist]

[People - Businessmen]

[People - Philosopher]

[People - Politics]

[People - Sports]


[Philosophy - Concepts]

[Religion - Buddhism]

[Religion - Hinduism]


The above example has to be applied to ~40,000 URLs. Avoiding over-categorization is a must. Strive to keep the sub-categories broad and most relevant.

Here is another example of a categorized set:

While researching for ways to execute this task myself, I came across a few links which may be useful:

I don't know what is the best approach to tackle this task, so kindly propose and demonstrate your method.

Kindly contact me if any further clarification is needed.

Thank you for your interest. Good day!

Budget: $400

Posted On: February 27, 2019 05:00 UTC
Category: Data Science & Analytics > Other - Data Science & Analytics

Skills: Data Entry, Data Mining, Data Scraping, Natural Language Processing, Wikipedia
click to apply

Darbo Tipas(ai):
  • PHP
  • CSS
Duomenų Bazė:
Operacinė Sistema: Linux
Siūlymų Skaičius: 0
Siūlosi Žinutės Kaina Trukmė Įvertinimas Informacija