| << Back to the homepage of this tool | >> To the Github repo of this page |
Latest update: 16 September 2024
This page gives more info about
- The structure of the GLAMorousToHTML repository, its files and folders
- Short description of their functions (see the docstrings for more detailed functional descriptions)
- How to run this repo yourself
- Change log
- Features to be added
What are the main files and folders in this repo, and what do they do?
- GLAMorousToHTML.py : The main script
- GLAMorousToHTML_functions.py:
category_logo_dict.json category_logo_dict_nde.json
README.md - this file
GLAMorous_MediacontributedbyKoninklijkeBibliotheek_Wikipedia_Mainnamespace_10012024.html
-
site :
- site/nde :
- site/logos :
- site/flags :
-
data :
- data/nde :
- data/nde/aggregated :
-
reports :
-
stories :
To follow..
- Reports
- Added reports for partners of NDE, the Dutch Network for Digital Heritage.
- Moved all GLAM reports to separate folder and page.
- Added report for Leiden University Library
- In those reports, removed underscores from article URLs.
- Code
- Refactoring:
- Reduced general.py to only contain very general functions that can be used everywhere in this project.
- Grouped all functions specific for GLAMorousToHTML.py module into dedicated module GLAMorousToHTML_functions.py.
- Added type annotations for many functions.
- Made the GLAMororous 'Search depth' parameter configurable via category_logo_dict.json. Default = '0', for no subcategories. For Category:Collections of Leiden University Library depth = 5, so all files up to 5 subcategories deep are also taken into account.
- Refactoring:
- Included reports for 14 institutions from Australia and New Zealand.
- Included reports for institutions from Norway, Sweden, Finland and Sweden.
- README.md: Added explanations how you can run the script yourself.
- Refactored all code into multiple separated modules: setup.py, general.py, buildHTML.py and buildExcel.py. This has reduced the complexity of the main script GLAMorousToHTML.py significantly and made the total suite of code much more modular and easier to understand, maintain and expand.
- Moved all HTML report pages into a separate site/ folder. This has made the repo much cleaner, clearer and more maintainable.
- Created five HTML files that redirect the old KB HTML pages (from 27-01-2022 to 16-01-2024) to the new equivalent ones in "/site" folder. Did not implement redirection for other institutions.
- Per 14-02-2024 added Excel outputs in data/ folder, to be used as structured input for data applications, such as OpenRefine
- In the proces of updating the data structure in category_logo_dict.json, where the new structure can be seen under the 'Netherlands' key.
- Improved pagetemplate.html to be key based ({numarticles} Wikipedia articles) rather than index based ({0} Wikipedia articles)
-
Export reports to Wiki format and put on Commons: (work in progress)
- Index page for all institutions
- Index page for KB, related to Category:Media contributed by Koninklijke Bibliotheek.
- KB report dd 14 Feb 2024, for this category.
- Index page for KB, related to Category:Media contributed by Koninklijke Bibliotheek.
- Index page for all institutions
-
Create all Datawrapper visualisations via the API
-
Add page/file request for category trees. Explore https://doc.wikimedia.org/generated-data-platform/aqs/analytics-api/reference/commons.html