Your feedback is very important as the pilot version of Geirfan is tested. Whether you have criticisms, compliments, or questions, it would be great to hear from you! Reach Geirfan by email, or come and have a conversation on Mastodon or on Twitter.

About Geirfan

Geirfan is a dictionary of some of the most frequent words in Welsh. The content is aimed at adult learners, as well as anyone who needs a clear, comprehensive explanation of common Welsh words.

The questions and answers on this page will help you understand what Geirfan is and how to use it. If you have questions that aren't yet covered here, please do get in touch. The site will be in development for the next few years, and your feedback and questions will be crucial in Geirfan's future.

Welcome from the Editor, Bethan Tovey-Walsh A photograph of Bethan Tovey-Walsh, a middle-aged white woman wearing
        glasses and smiling at the camera. She is wearing a T-shirt with the French words 'Madame
        est insolente' on it.

A warm welcome to the website of Geirfan, an online dictionary whose focus is on providing comprehensive information about the most commonly used contemporary Welsh words.

Geirfan began as a way to demonstrate the potential of data created as part of a research project, whose purpose was creating frequency-based wordlists from the CorCenCC corpus data. The enthusiasm of the project's advisory board, as well as useful feedback from learners of Welsh about prototype dictionary entries, inspired me to create this website as a place to develop dictionary content for learners (and others). It's essential to recognize the support of the Economic and Social Research Council for this original research project, whose output has been foundational in building Geirfan.

I would like to offer heartfelt thanks to Emyr Davies, Dawn Knight, Steve Morris, and Helen Prosser. Without their support and wise counsel, Geirfan might well never have existed. Their detailed feedback and guidance on the early dictionary entries was central to building an editorial policy for Geirfan.

On the technical front, thanks to Marc Morris from the National Centre for Learning Welsh for his support and for offering his time to discuss the early development of the website. Norm Tovey-Walsh has also been a huge help; he's an unparallelled rubber ducky! Many thanks, too, to Eirian Conlon from the National Centre for Learning Welsh, who recorded the northern Welsh audio clips for the first 60 dictionary entries. Steff Thomas gave detailed feedback on draft content for Geirfan, and Michael Sperberg-McQueen gave his invaluable imprimatur on the site design.

It goes without saying that any errors in Geirfan are solely my own responsibility.

Frequently Asked Questions
Why the name Geirfan?
Geirfan is a portmanteau word, made up of the words gair (word) and man (site): a Word Site.
You'll see the same change to the vowels of gair when it appears as part of other words, like geiriadur (dictionary) and geirfa (vocabulary). Man is a common element in words for places and locations, like canolfan (centre) and gwefan (website)
Who's responsible for Geirfan?
Geirfan was originally inspired by a project to create a learner-friendly list of frequently used Welsh words. The project was led by researchers from Cardiff University, including principal investigator Dawn Knight and lead researcher Bethan Tovey-Walsh, along with an advisory committee from the National Centre for Learning Welsh (Helen Prosser), the WJEC (Emyr Davies), and CorCenCC (Steve Morris).
The Geirfan website and the first batch of entries was created by Bethan Tovey-Walsh to show how the wordlist project's data could be used as the basis of Welsh-language learning materials.
How do you choose which words to include in Geirfan ?
The Geirfan wordlist comes from the 500-word list created by the wordlist project mentioned above. It is therefore based on lists of the words most frequently encountered in modern Welsh, alongside feedback from Welsh-language tutors about the types of vocabulary which are most useful to their students.
There is currently a working list of around 600 words to be included. This list was compiled using data from CorCenCC, the National Corpus of Contemporary Welsh. CorCenCC is a collection of spoken, written, and electronic Welsh, collected within the past ten years. It represents the largest corpus of modern Welsh in existence, and provides invaluable insights about how Welsh is used today.
From the raw lists of the most frequent words encountered in CorCenCC, a team worked to identify a core list of vocabulary which would be most useful for learners. This step included taking into consideration the opinions of Welsh-language teachers about the usefulness of various types of vocabulary, and developing principles for identifying high-frequency words which were nonetheless not suitable for a learner dictionary. If you would like to see the frequency lists, and find out more about the process of selecting learner-appropriate vocabulary, the project's results are available to download here. (An academic journal article is also in preparation; I will add a link here once it's available.)
The first sixty words added for the pilot project were chosen in order to illustrate the full range of content which will be added to Geirfan over time. There are therefore some closely-related families of words, but also some which may seem randomly-chosen. The latter are almost always examples of particular word types which were needed to test site functionality and to illustrate Geirfan's capabilities.
Who is Geirfan meant for?
The primary audience includes anyone who's learning Welsh. Because early entries will focus on the most common words in Welsh, the dictionary is likely to be most useful for beginners at this stage. However, the entries are comprehensive and provide a wealth of additional information, such as quotations, tips about usage, information about the origins of the words, and lists of related words. These features will be useful to learners at any level, and may also be of interest to fluent Welsh speakers.
Others who might find Geirfan useful include teachers, and parents of children in Welsh-medium education.
Is Geirfan suitable for children?
Geirfan aims to provide comprehensive information about the words listed in our dictionary, including any offensive meanings, or meanings related to sensitive topics. Including these meanings is essential so that learners can avoid accidentally using an offensive word or making an unintended double entendre. However, it does mean that you may prefer to preview the content before showing it to younger children.
Apart from the question of content, the language used for definitions and explanations in Geirfan will also be difficult for younger children. Adult learners benefit from access to detailed information about the vocabulary they are learning. Geirfan's definitions therefore aim to be thorough and exhaustive, rather than simple. As a result, the content is probably not accessible to children until they are in their later teens.
In summary: once a child is at an age when they can benefit from using adult dictionaries, and once you are comfortable allowing them to do so, Geirfan may be appropriate for them. As with any other online content, however, please check out the site yourself if you have concerns about its suitability for your child.
How do you choose your example quotations?
There are three types of quotation in Geirfan:
  1. examples of a word in use, taken directly from CorCenCC
  2. examples of a word in use, adapted from CorCenCC
  3. invented to show a word in use
The quotations of type 1 can be found in the CorCenCC corpus in the exact form in which they appear in Geirfan (excepting changes to capitalize initial letters of sentences and add ending punctuation). Type 2 are quotations from CorCenCC which needed some changes in order to make them suitable for Geirfan. This might mean removing parts of a sentence to make it shorter, correcting typing errors, and replacing very unusual words with ones which a learner will find easier to understand.
Type 3 quotations are used only when there is no suitable material from CorCenCC. This is not very usual, since the words are chosen because they are very frequent in the CorCenCC data. One of the commonest reasons for using a constructed example is to illustrate an unusual initial-consonant mutation.
Even when a Type 3 example must be used, it is based as far as possible on the CorCenCC data. Information about words which commonly appear together can be used, for example, so that the sentence features vocabulary that normally co-occurs with the focus word.
How big will Geirfan be when it's finished?
Six hundred entries is the initial target. After that, we shall see!