Data Science with Bhagavad Gita

Home » Data Science with Bhagavad Gita

Editor’s note: Here is a peek into visualizations, high frequency words, alphabet dictionary and linguistic cues in the divine song that is Bhagavad Gita.

A friend once said, “Reading the Gita had downloaded enormous serenity in me. No matter what happens outside, it is still from inside.”

It takes a lot of courage and serendipity to read the Gita. I had been procrastinating it long for the fear of grammar, vocabulary and consequent clouding of attention. Arjuna traversed from the sorrowful aisles of vishada to the opulences of vibhuthi, like a bud unfolding each petal to welcome the spring. Perhaps the song of Gita, the words that weave it and the translations will aid in the blossoming of the Gita philosophy in the hearts of its seekers.

Arjuna was blessed with a vision to behold the divine cosmic form. At a very gross level, can we behold the words that weave this Song? For those who are bewildered by the vast grammar tables in Sanskrit, is there an entry way for Gita Parayanam and Bhaja Govindam without the obstacles of Dukrinkaranam (grammar)?

Recently I was reading about Twitter Analysis with Data Science and it stuck to me, “How about doing it with Gita?”

What are the number of words? What was the conversation style? Were there any grammar patterns to learn? How did the alphabet wise dictionary look?


My knowledge of Gita meanings and recital are from the handpicked verses from my Yoga Course at SVYASA university a few years back. Composed in the Anushtup meter, there is a poise and resonance in its rendering. Based on what seemed like repeating patterns to my ear, I have conducted this experiment using the Sandhi free text of the Gita obtained from Kaggle Data Science website.

I have deployed Python and Data Science modules for text parsing, iterating and book keeping of the patterns. Input patterns were provided based on familiar structures in the Gita. As the English character set is lossy and cannot represent all the characters of Sanskrit alphabet accurately, an extended version with diacritics (IAST) that is used internationally in Sanskrit transliteration has been used for the analysis.

Data Science was used primarily for listing the high frequency words. Human intervention was used for the following:

  • sorting the high frequency words into 4 buckets (connectors, pronouns, nouns and verbs)
  • normalising the noun counts with the help of prefix tagging and data science
  • extracting high frequency verb lists using common suffix patterns in the context of dialogue in Gita


Let us begin with the number of chapters and verses in each.

About 10K words make up these 700 verses. Seems like a big number to digest. Like a Data Scientist, let us work in the Frequency Domain. Not all the 10K words are unique. Let us make a table of frequent words and their frequency.

ca 392 na 256 eva 173 aham 103 api 101 tat 93 karma 84 mām 84 yat 79 saḥ 74 sarva 67 hi 67 yaḥ 66 tu 66 iti 65 te 65 me 61 uvāca 64 mat 52 tathā 48 mahā 43 brahma 42 asmi 39 idam 39 pārtha 38 ātmā 37 param 36

Constructs of the Gita Language

As such these words do not provide much information directly. So let us classify them into 4 categories — Connectors, Pronouns, Nouns & Verbs, with appropriate theme .


The connector words in Sanskrit are immutable and hence do not need normalisation. Just like in English, the connector words seem to top the list and ambush the main message. But instead of discarding them, we can get a lot of insights from these words on the mood and the conversation style in the Gita.

Do we see one sided preaching in the Gita? Where there any questions? Any negations? Were the sentences complex? Were there examples and illustrations?

and / certainly / indeed / therefore

ca 392 eva 173 api 101 hi 67 tu 66 evam 30 tasmāt 25 iti 65 vā 25 iva 18

if-else / when-then

yat 79 yadā 12 yathā 22 yatra 7 tat 93 tadā 12 tathā 48 tatra 13

Q&A / Negation

kaścit 9 na 256 mā 9 katham 9

High frequency connector words in Bhagavad Gita

Complex Sentence Connections [~850 occurrences]


ca (and) 392 evam (thus) 30 eva (only/indeed) 173 hi (certainly) 67 tu (indeed) 66 iti (like this) 65 tasmāt (therefore) 25


  1. abhyāsena tu kaunteya vairāgyeṇa cha … [Certainly with practise and with detachment]
  2. evam ukto hṛiṣhīkeśho [and thus addressing] [1.24]
  3. karmaṇi eva adhikāras te [Only you have right over …]
  4. iti ahaṁ vāsudevasya [Like this I have heard] [18.74]

Cause effect and illustrations [115 occurrences]

when-then, where-there, as-so

yadā tadā (when then) 24 yatra tatra atra kutra (where-there) 20
yathā tataḥ (like similarly) 70


  1. yadā yadā hi dharmasya … tadā ātmānaṁ [When ever there is .. at that time][4.7]
  2. yatra yogeśhvaraḥ kṛiṣhṇo yatra pārtho … tatra śhrīr …[Where there is krishna & Arjuna, there is wealth …] [18.78]
  3. yathā ākāśha-sthito nityaṁ…tathā sarvāṇi bhūtāni … [Just like how the Sky is stationary… Like that All beings] [9.6].

Lots of Q&A [305 occurrences]

no, do not, how, why

na (no) 256 mā (do not) 9 katham (how) 9 kim (why) 21


  1.  phaleṣhu kadāchana [not entitled to the fruits][2.47]
  2. na inaṁ chhindanti śhastrāṇi [not can the weapons tear][2.23]

Miscellaneous [196 occurrences]

mahā 43 sarva 67 param 36 śrī 29 punaḥ 21

The frequent connectors listed above add up to a 1500 words of the 10K list.


Was it predominantly a conversation between two friends (first and second person)? Did it have references to others (normal people, exemplary people, objects? – third person)?

first person:  aham (i) 103 mām (me) 84 me 61 mama (my) 24 mayā (by me) 22 mayi (in me) 20

second person:  tvam (you) 31 tvām (to you) 20

third person:  ye te (those who…they) 95  yaḥ saḥ (he who…he) 140 tasya (their) 21

object: yat-tat (that which-that) 170 idam (this) 39 etat (these) 25 ayam (this) 24 iha (in this) 21


mayi āveśhya mano ye māṁ nitya-yuktā upāsate
śhraddhayā parayopetās 
te me yuktatamā matāḥ [12.2[ye-te-me]

Those (ye) who fix their mind on me (mayi) and always engage in devotion to me (māṁ), they (te) are considered by me (me)to be the best yogis.


  1. ahaṁ vaiśhvānaro bhūtvā …[15.14] I am the digestive fire…
  2. iti te jñānam ākhyātaṁ guhyād guhyataraṁ mayā … [18.63] In this manner the knowledge which is utmost secretive has been revealed to you by me …

The small list of the above pronouns revolving around ‘me’, ‘you’, ‘they’ and ‘it’ sum up to about 900 words.

Common Pronouns in Bhagawad Gita


For our analysis the nouns were normalised separately using Data Science. They contain the main message of the topic. The following word cloud represents 1200 of the 10K words in the Gita. I have clustered them under themes for easy classification purpose.

Names of Arjuna & Krishna 205

arjuna 48 pārtha 42 kauntey 25 bhārata 22 kṛṣṇ 14 bharatṛṣabh 8 guḍākeś 4 bhagavān 28 keśav 7 hṛṣīkeśa 5 govinda 2

Related to Moderation 275

yoga 105 jñān 91 yukta (moderation) 38 sama (equanimity) 41

Related to Karma (Action)

karma 119 kuru (do) 18 phal (fruit) 32 saṅg (attachment) 34 akarma 8 dharma 20

Related to Bhakti (Devotion) 247

brahma (creator) 42 deva (gods) 35 bhava (emotion) 10 bhakt (devout) 25 yagna (sacrifice) 12 bhog (enjoy) 15 priya (dear) 21 upāsana (worship) 6

Ascent through Sense Organs, Mind, Intellect & Soul 273

indriya (senses) 43 mana (mind) 39 buddhi (intellect) 55 ātma (soul) 136

Various Emotions 135

kāma (lust) 44 dukḥa (sadness) 28 sukha (happiness) 32 rāga (love) 10 dveṣa (hatred) 13 bhaya (fear) 8 krodhaḥ (anger) 6

Physical Elements & Traits 200

bhūtā (elements) 72 prakṛti (nature) 29 janma(birth) 17 mṛtyu (death) 10 guṇa (attributes) 23 tamasa (inert) 15 rājasa (passionate) 14 sāttvik (good) 15

The above noun list comprises around 1200/10K words in the Gita.


The verb list was extracted using suffixes for common verb patterns in the Gita. It was predominantly a conversation between the two friends in the first and second person with references to mankind in singular and plural tense. The following cloud presents data of around 500 high frequency verbs.

First Person Singular

Think of these as mostly Krishna telling about himself. For instance icchāmi — I wish


paśyāmi (I see) 7 icchāmi (I wish) 6 visṛjāmi (I create) 2 hṛṣyāmi (I take pleasure) 2 dadāmi (I give) 2 sambhavāmi (I appear) 2 śaknomi (I am able) 1


pravakṣyāmi (I shall explain) 4 vakṣyāmi (I shall explain) 4
Total 34

Second Person

These are mostly Krishna addressing his friend Arjuna.

asi (you are) 17 arhasi (befitting you) 10 avāpsyasi (you will attain) 4 prāpsyasi (you will attain) 2 icchasi (you desire) 3
Total 36

Third person singular

These relate to actions pertaining to different kinds of people (the exemplary and not so exemplary), actions pertaining to objects. It is interesting to note that singular usage seems more prevalent than plural usage.

Interestingly in Sanskrit there are two kinds of verbs. Very simply and crudely these can be described as: a) the sophisticated higher order verbs, such as arise-born-bloom-gain, and b) day-to-day action verbs which are associated with nouns, such as sit-eat-give-go. Based on the suffix patterns one can differentiate the two kinds.

Sophisticated: ucyate (calls) 28 vidyate (is there) 9

Regular: bhavati (happens) 17 paśyati (sees)12 bhavet (would be) 2 tyajet (should sacrifice) 3 adhigacchati (attains) 8

viśanti (falling down) 7 paśyanti (see) 5 gacchanti (go) 6


Let us look into “accompanying verbs” aka gerunds. They are not the main verb in the sentence but accompany the main verb to make the sentence construction more terse.

dṛṣṭvā(having seen) 11 paśyan (seeing) 3 tyaktvā (having relinquished) 13 āśritya (having resided) 3 labdhvā (having obtained) 2 kartum(to do) 7 śocitum (to lament) 3 hantum (to kill) 3 kurvan (doing) 5 smaran (remembering) 3

That was a total of 446 verbs out of the 10K words.


Let us take a peek into a few popular noun patterns for different contexts. In Sanskrit, the context is embedded in the form of suffix. This is the wonder of Sanskrit which gives it the unique property of making the sentence valid no matter how the words are shuffled.

For beginners, I like this approach as it reinforces context patterns in a natural way. I have taken them from popular verses which follow a constant pattern.

Kartha, the Subject (aḥ)

saḥ (he) yaḥ (he who )
manaḥ (mind) yogaḥ puruṣaḥ(person)
arjunaḥ sañjayaḥ

Karma, the object (am)

patraṁ puṣhpaṁ phalaṁ toyaṁ [9.26] leaf, flower, fruit, water

Karana, the instrument of action

[With] (ena/eṇa suffix)

abhyāsena tu kaunteya vairāgyeṇa cha gṛihyate [6.35] With practise, With Detachment

abhyāsena (with practise)
(with detachment)
(with Yoga)

Sense or Offering: (aya)

paritranaya sadhunam vinasaya ca duskrtam … [4.8]

paritrāṇāya (to protect)
 (to annihilate)
 (to establish)

Sense of Position

dharma-kṣhetre kuru-kṣhetre [1.1] At Kurukshetra, At the War ..

dharmakshetr(at the place of dharma), kurukshetr(at kurukshetra), madhy(in between)

Cause & Effect Scenarios

From X arise Y.. From Y arises Z.. Or Compared to X, Y is … Compared to Y, Z is…

In Sanskrit, the common noun suffix for Cause is ‘at’, ‘ad’

krodhād bhavati sammohaḥ … [2.63]

annād bhavanti bhūtāni … [3.14]

śhreyo hi jñānam abhyāsāj jñānād dhyānaṁ viśhiṣhyate

krodhāt (from anger) sammohāt (from confusion) smṛiti-bhranśhāt (from clouding of memory) buddhi-nāśhāt (from clouding of judgement) annād (from rice) parjanyād (from rain) yajñād (from sacrifice)

Relational Words

Arjuna’s Bow, Arjuna’s Sorrow.. The common noun suffix for relation is ‘sya’

tasya (his) dharmasya (dharma’s)


Let us see a few examples of different noun contexts in plural forms.


indriyāṇi (sense organs)  śastrāṇi (shastras)  bhūtāni (elements)

Sense of Relativeness

This pattern is found consistently in verses 10.21 to 10.31 of Vibhuthi Yoga. In the description of plurality and opulences, Krishna describes “among A I am B, among C I am D, among E I am F…”

vedānāṁ sāma-vedo ’smi devānām asmi vāsavaḥ
indriyāṇāṁ manaśh chāsmi bhūtānām asmi chetanā

I am the Sāma Veda among the Vedas, and Indra among the celestial gods. Among the senses I am the mind; Among the living beings I am consciousness.

Sense of Location

When I first heard this chapter ending line, I was intrigued by it. What did these suffixes mean? They have such a rhythmic effect.

Shrimad-Bhagavad-Gitasu  Upanishadsu Brahmavidyayam Yogashastre Shri Krishna-Arjuna-samvade

In the Gita, In the Upanishads, In the Brahma Vidhya & Yoga Shastras, In the conversation between Krishna & Arjuna

In this manner of skimming through high frequency words, we have covered about 1600 connectors, 900 pronouns, 450 verbs and 1200 nouns. That makes it around 4K out of the 10K words in the dictionary.

This definitely helps us get a feeling for the language and the conversation style of the Gita.


After this frequency analysis, let us now inspect how many words begin with each letter.

I have represented characters as per Sanskrit Alphabet matrix (swara and vyanjana) and for simplicity, clustered the consonant pairs.

Bhagavad Gita – letterwise word frequency

Looks like words starting with ‘a’, ‘s’, ‘m’ are very popular. It is interesting to note that there are no words from the retro series ‘ṭ’, ‘ḍ’, ‘ṇ’. If we were to list the top word starting with each letter, we will have the following list.

aham 103 ātmā 37 iti 65 īśvaraḥ 8 uvāca 64 ūrdhvam 3 eva 173 aiśvaram 4 oṁ 4 auṣadhīḥ 1 ṛṣabha 8 karma 84 khe 1 guṇa 16 ca 392 chinna 3 jñānam 31 jhaṣāṇām 1 tat 93 deva 16 dhanam 12 na 256 pārtha 38 phalam 16 brahma 42 bhagavān 28 mām 84 yat 79 rūpam 16 loke 12 vā 25 śrī 29 ṣaṭ 2 saḥ 74 hi 67

Words of Direction, Time and Count

ūrdhva-mūlam adhaḥ-śhākham [15.1] aśhvatth tree with its roots above and branches below. ūrdhva (down) adhaḥ (up)

namaḥ purastād atha pṛiṣhṭhatas [11.40] Salutations from front and rear
puratah (front) pṛiṣhṭhah (back)

aham ādiśh cha madhyaṁ cha bhūtānām anta eva cha [10.20] ādi (beginning), madhya (middle), anta (end)

Other descriptors used for indicating infinity, unperishable-ness, forever and plurality feel are akshaya, avyaya, anantha, sanathana, nana, bahu, aneka. These are naturally to be found in the Vibhuthi Yoga and Vishwa Roopa Darshanam chapters.

paśhya me pārtha rūpāṇi śhataśho ’tha sahasraśhaḥ
nānā-vidhāni divyāni nānā-varṇākṛitīni cha [11.5]

śhataśhah (hundreds), sahasraśhaḥ (thousands), nāna (many)


It seems overwhelming by looking at the sheer number of the sophisticated words. A simple trick is to take cues from prefixes, suffixes and see affinity with the underlying word.

The prefixes add a sense of direction and command to the words thereby highlighting, negating, reciprocating or bringing a sense of reinforcement. Truly this makes each word an episode by itself.

Ascertaining (forward)

pravakṣyāmi (shall reveal/tell) pravṛttim (acting properly) pratiṣṭhitā (firmly established; tishta is to stand)

[pravakṣyāmi] Even the intelligent are bewildered in determining what is action and what is inaction. Now I shall explain to you what action is, knowing which you shall be liberated from all sins. [4.16]

Reciprocating (backward) 

pratiyotsyāmi (fight back; yuddh meaning fight)

How can I counterattack to men like Bhisma and Drona? [2.4]

Sense of Circling

pariprasnena (enquiry), parityāgī (renounce), parikliṣṭam (grudging), paritrāṇāya (to protect; tra means to protect)

[pariprasnena] Inquire from him and render service unto the Spiritual Master. [4.34]

Transcend (beyond)

ativartate (transcends; vartate is dwelling) atitaranti (cross overs; taranti is swim)


adhigacchati (attains) adhiṣṭhānam (body; sthanam is place)

[adhiṣṭhānam] The body, the doer, the various sense organs, the many kinds of efforts and Divine Providence — these are the five factors of action [18.14]

In Proximity

upāshritya (refuge near me) upadraṣṭā (overseer or inner seer) upasaṅgamya (approaching near)

[upadraṣṭā] Yet in this body there is a transcendental enjoyer, who is the Lord, the supreme proprietor, the overseer, permitter and the Supersoul [13.23]

Generate (Give rise to)

udbhavam (generate)

Presence of the quality (with)

sammohaḥ (clouding of memory) samupasthitam (all present) saṁśuddhiḥ (total purification) sambhūtam (born) sambhavami (appear) sañjāyate (develops; jata is ‘goes’)

Absence of Quality

nirahaṅkāraḥ nirvedam nirmuktāḥ nirvikāraḥ

Descend (down)

avagaccha (understand; gaccha is go)
avajānanti (disregard) avajñātam (with contempt)

Follow (behind)

anubandhan (bound) anukampā (compassion, kampa is vibration) anuvartante (follow) anupaśyati (see; pasyati is to see) anusmara (remember) anudarśanam (observing; darshan is to see)

Towards / transitioning

abhijānāti (get to know; jānāti is know)
abhijāyate (take birth; ja is birth) abhirakṣitam (well protected)


Normalisation of all words would reduce the variance and number of unique words drastically. In addition, bucketing the words automatically into the 4 categories and rerunning the frequency distribution would give more precise data. Currently in the absence of normalisation, it seems the distribution of 10k words goes like this. This gives an idea about the words that are not so frequent.

Unique words: 3848
On an average, a word occurs 3 times
Standard Deviation from average: 10
Minimum number of occurrences of word: 1
50% of words occur only once
75% of words occur twice or less
Maximum occurrence of a word was 392


Unlike modern languages, Sanskrit is very structured and complex. With its 8 noun contexts, 3 genders, 3 pluralities, multiple verb contexts and complex word formation and word compounding schemas, it takes a lot of cognition and training to get mastery over the language. This higher order skill would be needed to parse and appreciate the whole gamut of documents in Sanskrit.

However, for acquiring the beginner parsing skills of a specialised popular work like the Gita, a pattern based learning and visualisations can be of great help for a novice.

The present work can give a head start and feel of the Sanskrit language for the seekers of the Gita. In the context of Gita’s High Frequency patterns, there is a decimation of effort needed to understand the whole gamut of grammar as we are mostly focusing on Male Gender words (leaving neutral and feminine), Present Tense and Singular Verbs and Nouns (leaving out dual and plural to an extent). The Gerunds are very nicely woven into the verses.

It could be used in the field of NLP and Education in Schools and Universities. For instance, visualizations and grammar weaved around a topic makes the subject easier to study. Top 20 words using each letter, top 20 verbs, top 20 nouns give a theme around learning.

The idea used could be extended to visualise messages and language styles of the Vedas, Ramayana, Patanjali Sutras, Lalitha Sahasranama, works of Adi Shankaracharya etc. to make a comparative study of language styles and moods.

Subtler aspects like highlighting Acoustic Patterns in words, poetic meter visualisations in sub verses, chapter wise sentiments, verse wise sentiment, specialty about first-last word of a verse, classifications, etc. can be further analysed with Data Science.

Editor’s note: An earlier version of this article was published HERE. It is republished here with the author’s permission.

Cover image: Dialogue with Arjuna, painting by Keshav Venkataraghavan

Scroll to Top