Beyond .gitattributes: Getting GitHub to Recognize Your New Programming Language for Better Development Overview
Creating a new programming language is an impressive feat, and naturally, developers want their creations to be properly recognized by platforms like GitHub. A recent discussion in the GitHub Community highlighted a common challenge: how to get GitHub to display a brand-new language in the repository's 'Languages' section. This insight explores the solution and offers guidance for language creators aiming for accurate representation and a clearer development overview of their projects.
The Challenge: Custom Languages and GitHub's Linguist
User flaviokalleu, the maintainer of 'Flang' (a bilingual declarative programming language with .fg extension), sought to have GitHub recognize Flang in their repository. They had correctly added a .gitattributes file with *.fg text linguist-language=Flang, expecting GitHub to pick it up. However, the 'Languages' section remained unchanged.
The core of the problem lies in how GitHub determines language statistics. GitHub uses an open-source library called GitHub Linguist. As community member Gecko51 clarified, the linguist-language attribute in .gitattributes is designed for remapping files to a language that Linguist already knows. It cannot define a completely new language from scratch.
*.fg linguist-language=Python
For example, if you wanted your .fg files to be counted as Python, the above line would work. But for a truly new language like Flang, Linguist has no existing definition to map to, rendering the attribute ineffective for full recognition.
The Solution: Contributing to GitHub Linguist
The definitive path to getting a new language recognized by GitHub is to contribute it directly to the github-linguist/linguist repository. This ensures that GitHub's core language detection engine learns about your language.
Key Steps for Linguist Submission:
Gecko51 outlined the essential requirements for a successful submission:
- TextMate/VS Code Grammar: A
.tmLanguageor.plistfile that defines syntax highlighting for your language's files (e.g.,.fg). This typically resides in thegrammars/folder of the Linguist repo. - Sample Files: Provide a sufficient number of small, non-trivial programs in your language within a dedicated folder (e.g.,
samples/Flang/). These samples help demonstrate real-world usage and aid in testing. - Entry in
languages.yml: Add a definition for your language in thelanguages.ymlfile. This entry includes crucial metadata such as the language name, file extensions, type (e.g., programming, markup), preferred color for display, and more. - Proof of Usage: Linguist generally looks for evidence of real-world adoption. Having a reasonable number of public repositories already using your language can significantly strengthen your case. This is often where newer languages face initial hurdles.
For detailed instructions, always refer to the official CONTRIBUTING.md guide in the Linguist repository.
Important Considerations:
- Naming Conflicts: Be aware of potential naming conflicts. As noted in the discussion, "Flang" is also the name of an LLVM-based Fortran compiler. Reviewers may raise this, so be prepared to discuss or differentiate your language.
- No Local Workaround for Full Recognition: Until your Pull Request is merged into Linguist, there is no local
.gitattributesworkaround that will make GitHub display your language as a distinct entry in the 'Languages' section.
Temporary Workarounds for Language Stats
While waiting for full Linguist integration, Hamdan-Saddique-ai suggested a practical workaround if you primarily need your files to contribute to *some* language statistic:
*.fg linguist-language=Python
By mapping your custom extension to an existing language (e.g., Python, JavaScript, or any language with a similar syntax), your files will be counted under that language. This doesn't provide true recognition for your custom language but can ensure your code contributes to the repository's overall software developer metrics, albeit inaccurately for the specific language.
Conclusion
For creators of new programming languages, achieving proper recognition on GitHub is a multi-step process that extends beyond local .gitattributes configurations. The definitive path involves contributing to the GitHub Linguist project, providing comprehensive definitions and samples. While this process requires effort and community engagement, it ultimately ensures accurate engineering performance insights and a precise development overview for your projects on GitHub.
