Enhancing GitHub Reports: A Conventional Commit Standard for Research Projects
Streamlining Scientific Workflows: Conventional Commits for Research
In the world of software development, Conventional Commits have become a cornerstone for creating clean, readable commit histories. They enable automated tools, improve collaboration, and provide valuable context for project evolution. However, what about research projects? A recent discussion on GitHub Community, initiated by user willdone-mt, highlights a significant gap: the existing Conventional Commits specification, designed for software, often fails to capture the unique nuances of scientific research.
The Challenge: A Need for Specialized Commit Types
willdone-mt embarked on a research project focusing on Git and quickly realized that scientific repositories lacked the structured commit practices common in software. The standard feat, fix, chore types don't adequately represent changes in data acquisition, processing, analysis, or methodology. This realization led to the development of a specialized adaptation: Conventional Commits for Research.
This adaptation aims to provide a medium-weight convention for commit messages in research projects, making it easier to track scientific progress, automate documentation, and generate more meaningful github reports on project activity. The proposed structure largely mirrors the original Conventional Commits:
[optional scope]:
[optional body]
[optional footer(s)]
Key Adaptations for Research Workflows
The core of this adaptation lies in its redefined and expanded commit types, tailored to scientific endeavors:
1. Universal Commit Types (Adapted)
feat: Same as Conventional Commits; notably, also used for newly acquired raw data.fix: Same as Conventional Commits; acknowledges that data can also have bugs and require fixes.style,docs,chore,ci,revert: Retain their original or Angular Convention's meanings, withdocsspecifically for non-scientific project management documents in research contexts.
2. Research-Specific Commit Types
process: For changes related to data processing.analsynt: For data discussion, analysis, or synthesis.dissem: For dissemination documents (e.g., manuscripts, presentations).method: For materials and methods protocols or documents.expt: For experiments, tests, validation, and verification of data and methods (distinct from 'experimental research' as a methodology).research: A grouping type for research-only changes in code-focused projects.
3. Code-Specific Commit Types (Grouped for Research)
code: Groupsfeat,fix,refactor,perf, andstylewhen the project is code-focused.devtool: Groupsbuildandtestfor development tool changes.
4. Recommended Scopes
While optional, specific scopes are recommended to add further clarity:
- For
featandfix,datais used to represent new or fixed data. - For
process, the scope should represent the stage or list of data processing. - For
dissem, the scope should represent the type of dissemination or part/chapter of a manuscript. - For
method, the scope should represent the stage or list of methods being used.
5. Defining Breaking Changes in Research
A crucial adaptation is the definition of a BREAKING CHANGE for research: any modification to data processing, methods, or analysis workflow/procedure that invalidates prior results. This definition, though acknowledged as still 'misty and vague' by the author, provides a vital starting point for managing scientific reproducibility.
Refining the Standard: Community Feedback
The original post invited community review, and the author, willdone-mt, followed up with a self-reflection on 'Type Clarity Issues.' They questioned whether research and code types should be more distinctly separated and considered merging expt with test for compactness. This ongoing discussion highlights the iterative nature of developing such a standard and the importance of community input to ensure its practicality and adoption.
This initiative offers a promising path for researchers to leverage structured commit messages, improving project transparency, reproducibility, and the utility of github reports for scientific endeavors. It's a testament to how developer productivity tools can be adapted to serve broader academic and research communities.
