I restarted the class on computer science from Udacity this past week. So far this course discussed the difference between a toaster and a computer in that a toaster is designed to do one thing while a computer can do anything we program it to do. A program is a code that tells the computer what to do in a given situation. I also got to do a little bit of programming in Python which is the language they use for the course. We started out with arithmetic, typing in print 1 + 1 and running it in the interpreter produces the result 2 . The print function is a way to see what the code we built is doing. Then it moved on to assignment statements. Typing in any word (name for example) an equals symbol and an expression assigns that expression to that word so if we write name = 5 * 3 and then print name it will evaluate to 15.
So lets say we wanted to print out more than just numbers, this is where we need to use strings. To create a string we could just use print 'This is a test' and the output would be This is a test. To create a string it needs to have quotes or double quotes on each side so 'the', "isn't" and '"simple" she said.' would all be valid strings. As long as the string starts and ends with the same type of quote it will work. Simple enough, but what if we have a really long string and we don't want to type it out each time? Then we could assign the string to a name and just print the name. For example: einstein_quote = '"Education is what remains after one has forgotten what one has learned in school." Albert Einstein' we can then run print einstein_quote and the string "Education is what remains after one has forgotten what one has learned in school." Albert Einstein will be the result.
This brings us to another question, what if we have an entire page as a string and we want to find where a specific word or link first appears? We can use the find function, so starting with the string we just used we could then write einstein_quote.find('one') and the output would be the location of the first instance of the word one in our string. Of course we would have to print that piece of code to actually see the location as a number. I'm going to explain this using a shorter string, 'one' for example, if I typed in print 'one'.find(n) I would get 1 as the result because the location is initialized by numbers starting at zero and n is the second letter in the word one. We could type print 'one'[1:] and this would result in ne, because it would print starting from position one through the rest of the string. If we typed print 'one'[1:2] we would get n because the end point (2 in this case) prints up to but not including that position. We could also write print 'one'[:2] and get on, because it would display from the start of the string up to position 2.
So why and how would we use this? Well, for lesson one the goal is to be able to extract a link from a webpage. Let's go back to our einstein_quote string and say we wanted to print just the part one has forgotten what one has learned in school. we could write some code to do this as follows.
first_one = einstein_quote.find('one')
end_point = einstein_quote.find('.') + 1
This should produce the result we are looking for. The reason the second step includes a + 1 is because the end point of an index (the [:] used to show parts of a string ) prints up to that position and we want to include that position.