{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Astronomical data analysis using Python\n", "===\n", "\n", "Lecture 5\n", "-------" ] }, { "cell_type": "markdown", "metadata": { "id": "3AeUr3YP2By3", "slideshow": { "slide_type": "slide" } }, "source": [ "The Python Module Ecosystem\n", "==============\n", "\n", "There are three types of modules you will encounter in Python.\n", "\n", "* Built-in Modules (come with all standard installations of Python)\n", "* Third Party Modules (need to be installed separately)\n", "* Your Own Modules (we will see how to make them soon)" ] }, { "cell_type": "markdown", "metadata": { "id": "TvXP7ZI-2By5", "slideshow": { "slide_type": "slide" } }, "source": [ "Built-in Modules - the Python Standard Library \n", "----------------\n", "\n", "* sys - contains tools for system arguments, OS information etc.\n", "* os - for handling files, directories, executing external programs\n", "* re - for parsing regular expressions\n", "* datetime - for date and time conversions etc.\n", "* csv - for reading and writing CSV tables\n", "\n", "and more than a hundred others that allow you to do many different things like text processing, networking and interprocess communication, internet data handling and much more. There are no built-in modules for advanced mathematics or big data handling.\n", "\n", "https://docs.python.org/3/library/" ] }, { "cell_type": "markdown", "metadata": { "id": "oFYeciW92By6", "slideshow": { "slide_type": "slide" } }, "source": [ "Third Party Modules\n", "------\n", "\n", "These need to be installed separately. There are probably hundreds of thousands of modules in every imaginable area of computing. We are only going to learn about a handful of them.\n", "\n", "* **numpy / scipy** - numerical plus scientific computing extensions to Python\n", "* **matplotlib** - using Python for plots\n", "* mayavi - for animations in 3D\n", "* pandas - for tabular data analysis\n", "* **astropy** - Python for Astronomers\n", "* **astroquery** - access online astronomy data repositories from Python\n", "* scikit-learn - machine learning and classification tools for Python\n", "\n", "Third party modules will need to be separately installed via a program called `pip`\n", "See installation instructions at: https://docs.python.org/3/installing/index.html\n", "\n", "For a list of publicly available Python modules see:\n", "https://pypi.org/ which has more than 340k modules available as of Nov. 2021, incl. 651 astronomy packages.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "A5e-91c22By7", "slideshow": { "slide_type": "slide" } }, "source": [ "Making your Own Modules\n", "------------\n", "\n", "Very simple. Open a file, say, \"MyModule.py\"\n", "\n", "Write code in the file.\n", "\n", "If the file is in the present folder or in folders in the PYTHONPATH environment variable, the following will work.\n", "\n", " import MyModule\n", " MyModule.somemethod ...\n", "\n", "* NOTE 1: **File name must have extension .py**\n", "* NOTE 2: **When importing extension must be dropped.**\n", "* NOTE 3: **Do not create modules with the same name as modules in the Python Standard Library**" ] }, { "cell_type": "markdown", "metadata": { "id": "rWx-U8-w2By8", "slideshow": { "slide_type": "slide" } }, "source": [ "Example Module - Example.py\n", "--------\n", "\n", " \"\"\"\n", " This is a custom module.\n", " Containing some functions for the purpose of demonstration.\n", " \"\"\"\n", " def fun1():\n", " print \"Inside fun1\"\n", " \n", " def fun2():\n", " print \"Inside fun2\"\n", " \n", " pi = 3.14\n", " e = 2.7\n", " \n", " print (\"I am a Custom Module\")\n", "\n", "The above code is stored in a file on my computer called Example.py. Let's see how to use it.\n", " " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 299 }, "id": "dHMY-rIf2By9", "outputId": "0b554e9f-9594-4aeb-fbca-6e8821553c4c", "scrolled": true, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I am a Custom Module\n" ] } ], "source": [ "import Example" ] }, { "cell_type": "markdown", "metadata": { "id": "wuSH17-T2By_", "slideshow": { "slide_type": "-" } }, "source": [ "Notice the message printed by Example.py. This is to illustrate that any output generated by Example.py will appear on the screen." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 129 }, "id": "ELqByNu62BzA", "outputId": "1b4a5d05-5a30-4ba0-e8e0-ebf4836e0153", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.14\n" ] } ], "source": [ "print (Example.pi)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "dSjWOray2BzB", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inside fun1\n" ] } ], "source": [ "Example.fun1()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "t_0P7off2BzC", "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on module Example:\n", "\n", "NAME\n", " Example\n", "\n", "DESCRIPTION\n", " This is a custom module.\n", " Containing some functions for the purpose of demonstration.\n", "\n", "FUNCTIONS\n", " fun1()\n", " \n", " fun2()\n", "\n", "DATA\n", " e = 2.7\n", " pi = 3.14\n", "\n", "FILE\n", " /home/yogesh/Dropbox/python_2021/Example.py\n", "\n", "\n" ] } ], "source": [ "help(Example)" ] }, { "cell_type": "markdown", "metadata": { "id": "04B4d-RT2BzD", "slideshow": { "slide_type": "-" } }, "source": [ "Notice the description. It is what you enclosed in the \"docstring\" at the beginning of the module." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "assert Statement\n", "===" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "ename": "AssertionError", "evalue": "not enough data points", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mA\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0;36m5.0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;31m# assert condition, description\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mL\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"not enough data points\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mA\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"requires a string\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: not enough data points" ] } ], "source": [ "L= [0.1,3,5,7]\n", "A= 5.0\n", "# assert condition, description\n", "assert len(L) > 10, \"not enough data points\"\n", "assert type(A) is type(\"\"), \"requires a string\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use assertions liberally, they make debugging complicated code much\n", "easier. Particularly useful at interfaces e.g. just before a function is\n", "called. Very important component of defensive programming" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Exception handling with try-except\n", "======" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Divisor = 0\n" ] } ], "source": [ "y=5\n", "x=0\n", "try:\n", " ratio = y/x\n", "except ZeroDivisionError:\n", " print ('Divisor = 0')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The structure of try-except\n", "====" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "try:\n", "> [do some processing]\n", "\n", "except SomeError:\n", "\n", "> [respond to this particular error condition]\n", "\n", "> raise SomeError # now let something else handle the error" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The try/except syntax has the advantage that what you want to do\n", "appears first, you don’t have to read past a lot of error trapping code to\n", "find out what a particular block of code is doing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "finally - block is executed even if an exception happens\n", "====" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "f = open('thisfile.txt',’r’)
\n", "try:
\n", "\n", " [do something with the file]\n", "\n", "finally:\n", "\n", " f.close()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "If you know what exception to expect\n", "========" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your specified file is not found.\n" ] } ], "source": [ "try:\n", " f = open('is_it_there.txt')\n", "except FileNotFoundError:\n", " # Fallback code\n", " print(\"Your specified file is not found.\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The for-else contruct" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "for i in [1, 2, 3, 4, 5]:\n", " if i == 3:\n", " break\n", "else:\n", " print(\"this block is only executed when no item of the list is equal to 3\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# the `with` statement" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No need to explicitly close the file. The with has taken care of it\n" ] } ], "source": [ "# using with statement\n", "with open('output_file', 'w') as file:\n", " file.write('hello world !')\n", " print('No need to explicitly close the file. The with has taken care of it')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Variable Scoping\n", "====" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "6.283185307179586" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import math\n", "\n", "def area(r):\n", " \"\"\"Area of circle with radius r\"\"\"\n", " return math.pi * r**2 # Name math is known!\n", "\n", "def volume(r, h):\n", " \"\"\"Vol. of cylinder with radius r, height h\"\"\"\n", " return area(r) * h # Name area is known!\n", "\n", "volume(1., 2.) # Everything should be known at call time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Local variables inside functions\n", "=====" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "global x\n", "global x\n", "local x\n", "global x\n" ] } ], "source": [ "def f1():\n", " print (x) # Use variable x\n", "def f2():\n", " x = \"local x\" # Assign variable x\n", " print (x)\n", "x = \"global x\" # Same name x as in functions\n", "f1() # \"global x\"\n", "print (x) # \"global x\"\n", "f2() # \"local x\"\n", "print (x) # still \"global x\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Function scope\n", "===\n", "\n", "Functions provide a nested namespace (scope). Name references search four scopes:\n", "\n", "* L: the function’s local scope\n", "* E: the scope of enclosing functions\n", "* G: the (module’s) global scope\n", "* B: the built-in scope (e.g. print() function)\n", "\n", "Name assignments create local names unless you use the `global` statement\n", "\n", "Because of these scoping rules, if you create a function in your main program that has the same name as a built-in function then your function (e.g. `type()`) will override the built-in function leading to unexpected issues. So please don't create functions with the same name as built-in functions.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`global` statement\n", "=====" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "global x\n", "local x\n", "local x\n" ] } ], "source": [ "def f3():\n", " global x\n", " x = \"local x\"\n", " print (x)\n", "\n", "x = \"global x\"\n", "print (x) # \"global x\"\n", "f3() # \"local x\"\n", "print (x) # now \"local x\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "**Using global variables is almost always a bad idea. It makes debugging harder. Avoid them.**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Passing rules\n", "====\n", "\n", "* Immutable arguments act as if passed by value\n", "* When changing mutable arguments in place inside the function, the object is changed outside the function too!\n", "\n", "Reminder:\n", "\n", "* Numbers, strings, tuples are immutable\n", "* Lists, dictionaries, numpy.arrays are mutable" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`list` - a mutable function argument\n", "====" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5\n", "[1, 2, 3, 4, 5]\n" ] } ], "source": [ "mylist= [1,2,3]\n", "def extendlist(var):\n", " var.extend([4,5])\n", "\n", "extendlist(mylist)\n", "print (len(mylist))\n", "print(mylist)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "`float` an Immutable function argument\n", "=====" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "5.0\n", "\n", "\n", "5.0\n" ] } ], "source": [ "a=5.0\n", "def floattostr(var):\n", " var = str(var)\n", " return var\n", " \n", "floattostr(a)\n", "print (type(a))\n", "print (a)\n", "print()\n", "b = floattostr(a)\n", "print (type(b))\n", "print (b)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# `del` keyword in Python" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "The `del` keyword in Python is primarily used to delete objects in Python. Since almost everything in python represents an object in one way or another, the del keyword can also be used to delete a list, slice a list, delete a dictionary, remove key-value pairs from a dictionary, delete variables, etc. e.g. `del a,redshift['M31'],mylist[2], otherlist[3:]` \n", "\n", "The `del` frees up memory. Very useful if you have large arrays that are not going to be used in your analysis going forward. May need to run garbage collection explicitly." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The `is` keyword" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False\n", "True\n" ] } ], "source": [ "a=5\n", "b=5.0\n", "print (a is b)\n", "print (a==b)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# How to choose variable names " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "Names of variables can contain upper and lower\n", "case English letters, underscores, and the digits from 0 to 9, but the name cannot\n", "start with a digit. Nor can a variable name be a reserved word in Python.\n", "\n", "Choose descriptive variables names, i.e., names that explain the\n", "variable’s role in the program. Well-chosen variable names are essential for making\n", "a program easy to read, easy to debug, and easy to extend. Well-chosen variable\n", "names also reduce the need for comments." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Unicode variable names are allowed in Python 3\n", "=======" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "21\n", "4\n", "5\n", "6\n", "नमस्कार\n" ] } ], "source": [ "number = 5\n", "எண் = 7\n", "मेरी_सन्ख्या = 9\n", "यादी = [4,5,6,'नमस्कार']\n", "\n", "print (number + எண் + मेरी_सन्ख्या)\n", "\n", "for i in यादी:\n", " print (i)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Reserved words in Python\n", "======" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "These reserved words cannot be used as variable names: and, as,\n", " assert, break, class, continue, def, del, elif, else, except, False,\n", " finally, for, from, global, if, import, in, is, lambda, None, nonlocal,\n", " not, or, pass, raise, return, True, try, with, while, and yield . Besides these some special characters are used in Python programs - **: \\# {} () [ ]** \n", " \n", "**With this we have covered essentially all of Core Python. If you have reviewed the lectures and have practiced the notebooks, you can claim that you are now set to program in Python. The Python standard library and the vast world of third-party modules now awaits you. Congratulations!**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Assignment 1\n", "\n", "Assignment 1 is nearly ready and will be placed on the Moodle platform and on the website by tomorrow. Please solve the assignment. Preetish will conduct a tutorial session, most likely on 2 Dec, 2021 to discuss the assigment problems and their solution.\n", "\n", "**Remember programming is about the doing, not about the knowing.**\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Plans for the second half of the course\n", "\n", "We are now halfway through the lectures. In the remaining 5 lectures, we will cover\n", "\n", "* numpy + scipy\n", "* matplotlib\n", "* astropy\n", "* astroquery\n", "\n", "The second assignment will be more difficult than the first and will have real life (although very simplified) code interactions with real data.\n", "\n" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 1 }