The Indegenous was created with the goal to disperse knowledge about indigenous peoples. This database is an ongoing process of updating information about all the indigenous peoples of the Indian subcontinent. For the purpose of this dataset, we are using the terms ‘indigenous peoples’ and ‘tribal’ interchangeably. However, it must be noted that there can be delicate boundaries regarding the definition of these terms.
We are releasing this dataset for researchers and students who are interested in demographical research about India. The motivation for compiling this dataset comes from the finding that there is a lack of comprehensive and harmonized data sources on the internet about indigenous populations. 
The North Indian indigenous peoples database includes communities from Himachal Pradesh, Ladakh, Jammu and Kashmir, Uttarakhand and Rajasthan. Although Rajasthan is often considered to be a part of west India, the Indian administrative system couples it with the rest of the North Indian region. Notably absent are the ethnic majorities from the states of Punjab and Haryana who do not identify within the tribal system but in the ethnic minority system of "scheduled castes" defined by the Govt of India. Over time, the tribal identities of populations from those two states are diminished/have disappeared with a large adoption of the caste system in which the sense of belonging is stratified and codified under the Hindu law.
The Central Indian indigenous peoples database includes communities from Madhya Pradesh, Chhattisgarh, Uttarakhand and Uttar Pradesh. Notably absent are the ethnic majorities who do not identify within the tribal system but in the ethnic minority system of "scheduled castes" defined by the Govt of India. Over time, the tribal identities of populations from those two states are diminished/have disappeared with a large adoption of the caste system in which the sense of belonging is stratified and codified under the Hindu law.
The Eastern Indian indigenous peoples database includes communities from Bihar, West Bengal, Orissa and Jharkhand. Notably absent are the ethnic majorities who do not identify within the tribal system but in the ethnic minority system of "scheduled castes" defined by the Govt of India. Over time, the tribal identities of populations from those two states are diminished/have disappeared with a large adoption of the caste system in which the sense of belonging is stratified and codified under the Hindu law.
This database compiles information from the following states: 
Andhra Pradesh, Andaman and Nicobar islands, Dadra and Nagar Haveli, Daman and Diu, Goa, Karnataka, Tamil Nadu and Telangana. Although some communities are present in other parts of the country, given the historical migration between kingdoms and further the British empire; for the purpose of this dataset they have been listed separately. Depending on historical migration patterns, lifestyles and languages have adapted themselves to the larger ethnic majority - indicating a fluid identity between being a member of a tribe and the larger ethnic/national identity defined by a language. 
This database consists of demographic information from the states of Gujarat and Maharashtra. Similar to South India, large migration patterns indicate that there is exchange of languages/knowledge between several communities and indigenous identities may have been lost to the ethnic majority. It is explained by the diminished population numbers of several populations and an integration into the state/national identity through the language systems of Marathi and Gujarati. Exceptionally, the Bhili languages and their clans stand out even though no written script exists. Notably, the last census on several populations were only conducted in 1962-63, which suggests some information is outdated.
Globally, indigenous peoples are identified in terms of colonialism as the original peoples of a land, having distinct cultures and ways of life. We believe that this description does not necessarily hold true for all populations because several indigenous peoples are migratory and cannot be confined to a particular geographical state or region. Additionally, it adds a negative connotation to the self identification of indigenous people.
In India, the constitution grants special rights to indigenous peoples who have been described to have the following traits: 
Source: Vikaspedia (a Government of India initiative)
However, not all of them have been granted ‘Scheduled Tribes’ status by the Government of India. This is a special status given to those communities that were left behind by national policy throughout history with the purpose of increasing representation in public life.
This dataset takes a critical realism approach by examining the definition of ‘indigenous person’ as dictated by political entities versus an ontological examination of the identity of an ‘indigenous person’.
The way we at the Indegenous define ‘indigenous peoples’ is based on a cultural view rather than a governmentally recognized scheduled tribe approach. That is why there are inclusions of indigenous peoples that do not yet have ‘scheduled tribe’ status. This approach might be considered as biased or incomprehensive since there are complexities in hierarchy and relationships between different tribal populations. There are several instances where due to insufficient data, it was difficult to include tribes that are perhaps sub tribes or clans of other tribes. There is also a discrepancy in the way tribal populations view themselves, versus the way that they are viewed through literature written by ‘outsiders’. By an ‘outsider’, we refer to the researcher trained in western research training that takes an approach of an observer without considering context or indigenous sentiment. According to Indigenous scholars, western research training requires adaptation to fit Indigenous contexts(Kovach, 2010; Simonds and Christopher, 2013; Wilson, 2008). Western research needs a significant process of decolonization for future research based on lessons learned from Indigenous community partners who voiced concern over methods of Western research (Smith, 1999; Tuck and Yang, 2012).
In order to decolonize the way research is conducted, we have included self reporting of information through blogs written by indigenous peoples since we agree with Ranjan Datta that decolonization “is a continuous process of anti-colonial struggle that honors Indigenous approaches to knowing the world, recognizing Indigenous land, Indigenous peoples, and Indigenous sovereignty”.
Therefore, we keep a comprehensive view of indigenous populations by designing an inclusion criteria that answers the following:
1. Do they self-identify as an indigenous community themselves?
2. Does this community have a distinct language? If yes, do they have dialects?
2. Are they known to have distinct traditions?
3. Do they celebrate distinct festivals and have rituals that can be differentiated?
4. Are their food habits and cuisines distinct?
5. Do they have a history of economic or cultural bias?
After the identification of indigenous tribes, we applied a mixed research approach of combining qualitative and quantitative data.
This dataset comprises of both primary data sources: interviews with indigenous peoples with their information mentioned on the sources section as well as secondary sources by conducting literature study of former research on these indigenous populations.
Notes to researchers about bias:
As far as it is possible, we have relied on National Census data to have an unbiased presentation of indigenous populations. However, as mentioned before, this is often incomplete due to many factors:
1. Lack of official government recognition
2. Minimal contact of indigenous populations with government entities
3. Lack of academically rigorous sociological research
Where government data isn’t available, we have relied on academic research, journal publications and newspaper articles. All sources have been noted in the sources section.
Where even academic research fails to suffice, we have relied on information found on self-identified blogs and NGO research. There is concern that accuracy of information in these cases may be reduced due to the lack of academic rigor in reporting. Therefore, researchers have been noted to use reasonable judgement before citing these sources. To make it easier, there is a comments section which notes when such cases occur.
This section is incomplete due to lack of sources and is subject to review. Where it hasn’t been noted, the official sources of population data comes from the 2011 Government of India census. Otherwise, the source of population information is written under the comments section.
There are instances where a tribe is either a family of tribes of a subtribe of another. These have been a source of complexity that this dataset fails to address. We recognize that there may have been reductions made in these cases. Specific cases have been noted in the comments and other names sections. Additionally, in the other names section, we have written the clans or sub-tribes as other names.
Tribal family relationships are extremely complex and may not be classifiable by state or region or even languages.
For example, there are tribes that we struggled to find information about. From our knowledge, some of them are not tribes in themselves but rather clans that can be traced back to a single family which further distinguished themselves from their relatives through an inheritance based history rather than through sociologically proven methods.
Due to the nature of this database, it was not possible for us to show hierarchical structures and therefore this field of research remains to be undertaken. This type of research is not quantifiable but rather to be done as case studies. We welcome any and all discussions, opinions and corrections of the data we have cited.
References:
Kovach, M (2010) Indigenous Methodologies. Toronto, ON: University of Toronto Press.
Simonds, VW, Christopher, S (2013) Adapting western methods to Indigenous contexts. American Journal of Public Health 103(12): 2185–2192.
Smith, LT (1999) Decolonizing Methodologies: Research and Indigenous Peoples. London: Zed Books.
Tuck, E, Yang, KW (2014) R-words: Refusing research. In: Paris, D, Winn, MT (eds) Humanizing Research: Decolonizing Qualitative Inquiry with Youth and Communities. Los Angeles, CA: SAGE, 223–247
Wilson, S (2008) Research is Ceremony: Indigenous Research Methods. Winnipeg, MB: Fernwood