# Automate the Boring Stuff with Python (course) ## Overview Topic:: [[Python (MOC)]] URL:: [Course Website]() ## Table of Content 1. Python Basics 2. Flow control 3. Functions 4. Error handling 5. Lists 6. Dictionaries 7. strings 8. command line 9. Regular expression 10. Files 11. Debugging 12. Web scrapping 13. Excel, word and pdf 14. Email 15. GUI automation ## Notes ### Basics python components: 1. **Expressions** - for example: 2+2. A line of code that has a absolute single value 2. **Functions** - a collection of lines to preform the same code with the same logic. Functions are *called* 3. **Comments** - start with #. Python ignores these lines of code 4. **Statements** - lines of code that together form a conditional (they are not expressions, but might contain expressions). ### Form Control comparison operators: $ <=, ==, >=, !=, <, > $ Boolean operators: $ or, not, and $ #### If Statements if statements are checked according to their order, and only 1 will be used. (if the first statement is true, it will run and not all others) ```python x = 5 if x==5: # run this line print("x is equal to 5") elif x<5: print("x is lower than 5") else: # run this line print("x is either higher than 5 or something else happened") ``` #### While Loops an iteration that will continue as long as a statement is true. make sure to update the condition so that the while loop wont run forever ```python x = 5 while x<5: if x==1: x +=1 continue # skip this iteration and go back to the start print("the value of x is" + x) x +=1 if x==3: break # stop the loop on this condition ``` #### For Loops For loops are preforming an action over a given range, you can control the size of the range and the ticks ```python for item in range(10, 21, 2): print(item) # would result in 10, 12, 14, 16, 18, 20 ``` ### Functions defining the function is not the same as calling it. They are meant to reduce repetition and chances of error. Function can have: **Arguments** - a variable to be used each time the function is called (for example which name to print) **Optional arguments** - same as arguments, but with a default value so they are not mandatory when calling the function **A return value** - the function could return a value to be assigned in a variable ```python def my_func(): print("hello") print("my name is") print("idan") my_func() def func_with_argument(name): print("hello") print("my name is") print(name) func_with_argument("idan") def func_with_default_argument(name, greeting='hello'): print(greeting) print("my name is") print(name) func_with_default_argument("idan") def func_with_return(name, greeting='hello'): print(greeting) print("my name is") print(name) full_greeting = greeting + " my name is " + name return full_greeting my_greeting = func_with_return(name='idan') ``` ### Scope variable that is assigned outside of a function is part of the global scope, and is available for all functions. Variable assigned inside of a function is part of the local scope, and only available inside that function (and will be deleted afterwards). When a function tries to use a variable, it will first look for it in the local scope, and go up(out) if not found (for example to a parent function or the global scope). You can use "global my_var" to treat that variable inside the function as a global var. ```python eggs = 40 def my_func(): eggs = 20 print(eggs) my_func() # will return 20 print(eggs) # will return 40 def my_func2(): global eggs eggs = 20 print(eggs) my_func() # will return 20 print(eggs) # will return 20 ``` ### Lists a comma separated list of objects, for example int, str or variables. ```python My_list =["hey", "hello", "hi"] # Lists can be indexed (zero based) My_list[0] # would return "hey" # Or sliced: My_list[:2] # would return ["hey", "hello"] # useful list methods my_list.index("hello") # find the first occurance of an item - would return 1 my_list.append("hey there") # add item to the end of the list my_list.insert(2, "hi") # add item in a given index my_list.remove("hey") # delete the first occurance of an item my_list.sort() # either numerical, or alphabet. ``` ### Mutability Strings and tuples are immutable objects, you can't "update" them, they must be replaced with a new variable. Lists, dictionaries, and dataframes for example are mutable (can be updated) ### References Some variables are kept with a unique reference ID each time they are generated, while others are kept with the same reference ID, so when one is copied, they still all point to the same object. For example: ```python str_a = "pizza" str_b = str_a str_b = "hello" ``` in this case, the values of str_a and str_b are different, since these are immutable. However, for mutable variables, an update of one will propagate to the other. ```python my_list = ["a", "b", "c"] new_list = my_list new_list.append("d") print(new_list) print(my_list) # both lists will have the added "d" ``` To avoid this - use the "copy" method. ### Dictionaries: A list of key value tuples. Dictionaries are not ordered. ```python my_dict = {"age": 25, "name": "johnas", "gender": "male"} # You can access (or loop) the dictionary using my_dict.keys() # for ["age", "name", "gender"] my_dict.values() # for [25, 'johnas', 'male'] my_dict.items() # for [("age", 25), ("name", "johnas"), ("gender": "male")] # useful methods my_dict.get("school", "no school in dict") # checks if a key exist, and if not returns a default value. my_dict.set_default("age", 15) # checks if a key exist, and if not adds it to the dictionary with a default value. ``` ### Strings strings are list like element, which means: ```python my_str = "hello" # they can be indexed hel = my_str[:3] # "in" works in str "hello" in my_str ``` escape characters are useful to add problematic letters or added functionality to your string ```python my_str = 'this is carol\'s cat' # to treat the front slash as part of the string my_str = r'this is carol\'s cat' ``` for very long strings, you can use: ```python my_long_str = """this is a very lonnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnng string""" ``` useful methods ```python my_str = "Hello" # boolean testing my_str.isalpha() # only letters my_str.isalnum() # letters and numbers only my_str.isdecimal() # only numbers my_str.startswith("h") my_str.endswith("o") # str manipulation new_str = my_str.upper() # capitaliza all new_str = my_str.lower() # uncapitalize all new_str = ' '.join(['one', 'two', 'three']) # list to a single str new_list = 'hello world'.split(" ") # from str to list new_str = my_str.rjust(20,"*") # adds characthers from the right side until the str is at this length new_str = my_str.ljust(20,"*") # adds characthers from the left side until the str is at this length new_str = my_str.strip() # removes spaces from both sides of the str new_str = my_str.replace("h", "e") # replaces every instances of a letter in the str ``` string injecting ```python place = "home" time = "noon" food = "pizza" new_str = 'hey, we are meeting at {} at {}, dont forget to bring a {}'.format(place, time, food) ``` ### Regex ```python # pattern to look for message = "hey, my number is 111-3333-111" pattern = re.compile(r'/d/d/d-/d/d/d/d-/d/d/d') match = pattern.search(message) result = match.group() # to look for all matches match = pattern.findall(message) # to look for one option out of several - pipe factor re.compile(r'Bat(man|mobile|copter)) # one or more times - "?" re.compile(r'Bat(wo)?man) # one or more times - "+" # zero of more times - "*" # a defined number of times - {n}, or {3,5} for 3 or five. ``` be default, regex matches are "greedy", they try to find the longest string that matches the pattern. we can make the regex non-greedy (return the smallest possible string) by adding a "?" at the end of the pattern item, for example ```python pattern = re.compile(r'/d{3,5}?') ``` Findall will return a list of matches when the pattern has 0/1 groups, or list of tuples of matches if there are 2 or more groups. Findall doesn't return a match object but rather a list. ```python \d # numeric characters \w # - word characters \s # - space. ^ # - must be at the start of the string $ # - must be at the end of the string . # - wildcard, any character except new line. () # - create groups (return a partial match from your pattern). for example: pattern = "first name: {3,5} # this will match the entire pattern, but will return only the first name # to use the literal version (for example - "." as a dot, not a wildcard), # you need to add \, so: \. ``` Capitalize matches the opposite value (so /D would match all non numeric characters) You can create a custom class - r'[aeiou]' for example will match all the vowels. You can add ^ inside the squared brackets to negate the pattern (return all non vowels) additional params: re.DOTALL = . will match also new lines re.I (re.IGNORECASE) - treat upper case as lower case and vice versa (case-insensitive) you can also use the re.VERBOSE method to make you regex a bit more readable ```python pattern = re.compile(r''' # this regex is for a phone number \d{3}- # first 3 digits for state \d{5}- # main 5 digits for city \d{3}- # last 3 digits for household ''', re.VERBOSE) ``` since the compile function has only 1 second argument, you can add multiple arguments by: re.VERBOSE | re.IGNORECASE ... #### The sub Method replace a match with a different string. ```python pattern = re.compile(r'agent \w+') message = 'hey, my name is agent Boris' pattern.sub('REDACTED', message) # hey, my name is REDACTED # use the match in the "sub" - for example, return only the first letter of the name pattern = re.compile(r'agent (\w)\w*') pattern.sub('Agent \1', message) # hey, my name is agent B ``` ### File Management #### The Os Package a useful package to manage file paths ```python import os os.path.join("my_folder", "nested_folder", "file_name.csv") # print current working directory os.getcwd() # choose the work directory os.chdir("my_path") # .. - goes one level up # extract the the folder path for a given file os.path.dirname("my_path") # extract file name from a given path os.path.basename("my_path") # check if a file exists os.path.exists("my_path") # create new folder os.makedirs("new_folder_path") ``` #### Reading Text File ```python my_text_file = open("file_path") content = my_text_file.read() content_by_line = my_text_file.readlines() my_text_file.close() ``` #### Excel Files ```python Import openpyxl File = openpyxl. Load_workbook("file_path") Sheets = workbook. Get_sheet_names() Sheet = workbook.get_sheet_by_name("sheetname") Workbook.save("file_path") Sheet = workbook.create_sheet() Sheet.title = "my_new_sheet_name" ``` #### Pdfs ```python Import PyPDF2 Pdf_file = open ("file_path", 'rb') Reader = PyPDF2.PdfFileReader(Pdf_file) page = reader.get_page(0).extract Text() ``` #### Word ```python Import docx # package name python-docx Doc = docx.Document("file_path") Doc.paragraphs[0].text doc.add_paragraph("text") doc.save("file_path") ``` each text is separated per "run", which are sections that end when there is a change in styles (bold, underscore, italic...) ### Debugging #### Assertions you can use asserts and raises to provide custom errors in your code ```python def my_func(num_of_states): assert num_of_states<51, "there are too many states!" ``` #### Logging print statements to your console so that you will have more information on the actions and progress of your code ```python import logging logging.basicConfig(level=logging.INFO, format=format='%(asctime)s - %(levelname)s - %(message)s', filename="my_log.txt", datefmt='%Y-%m-%d %H:%M:%S') logging.info("this is my message") # loggers level """ debug info warning error critical """ logging.disable(level=logging.DEBUG) # cancels all loggers on this level or lower ``` #### Debugging over - skip (after executing) to next action or statement, for example - next line of code, next definition of a function, etc... step in - go inside a function call step out - skip to the return of that function Go - run the code until the next breakpoint (or the end of the script) quit - terminate the run a breakpoint will make Python run up to this point and stop there. ### Webscrapping opening urls ```python import webbrowser webbrowser.open("https://www.mysite.com") ``` requests ```python import requests res = requests.get("my_url") res.Raise_for_status() res.text # you can also parse html with beautiful soup Import bs4 soup = bs4.BeautifulSoup(res.text, "html.parser") Element = soup.select("my_css_element") Element[0].Text.strip() ``` If you need to fill out logins or search bars online ```python From selenium import webdriver My_browser = webdriver.firefox() Site = my_browser.get("url") Element = site.find_element_by_css_selector("css_id") Element.click() Element.send_keys("insert text here") Element.submit() Browser.quit() ``` ### Emails #### Sending ```python import smtplib conn = smtplib.SMTP("smtp.gmail.com", 587) conn.elho() conn.starttls() conn.login(user='[email protected]',password='1234') conn.sendmail(from_addr='mygmailadd',to='[email protected]', body='Subject: my email title\n\n Hello dear User\n How are you doing?\n Best of luck\n Idan\n\n') # this function returns a dictionary of failed sends, if its empty it means it was sent. conn.quit() ``` #### Reading ```python import imapclient conn = imapclient.IMAPclient('imap.gmail.com',ssl=True) conn.login(username='[email protected]',password='password') conn.select_folder('INBOX',readonly=True) UID = conn.search(['SINCE 20-Aug-2015']) # see imap documentation for more search options raw_message = conn.fetch([message_id], ['BODY[]', 'FLAGS']) import pyzmail message = pyzmail.Pyzmessage.factory(raw_message[message_id], [b'BODY[]']) message.get_subject() message.get_addresses('from') message.get_addresses('to') message.text_part.get_payload().decode('UTF-8') conn.logout() ``` ### GUI Automation since controlling your mouse could be dangerous (because you can't control your computer in the meantime), there is a failsafe that the program will automatically stop if you move your mouse to the top left corner ```python import pyautogui # control the mouse res_widgth, res_height = pyautogui.size() pyautogui.moveTo(x=5,y=10) # move to position pyautogui.moveRel(5,10) # move to a relative (offset) position pyautogui.click(x=300, y=30) # click on position # control keyboard pyautogui.typewrite("hello world") pyautogui.press("F1") pyautogui.hotkey("ctrl", "o") # search for component pyautogui.screenshot("save_image_to_path") pyautogui.locateCenterOnScreen("my_image_path") ``` ## External Resources