Which of the following method is used to check whether the file denoted by the pathname is a directory?
File Input/OutputFile Input/Ouput (IO) requires 3 steps: Show
Python provides built-in functions and modules to support these operations. Opening/Closing a File
Reading/Writing Text FilesThe Reading Line/Lines from a Text File
Writing Line to a Text File
Examples>>> f = open('test.txt', 'w') >>> f.write('apple\n') >>> f.write('orange\n') >>> f.write('pear\n') >>> f.close() >>> f = open('test.txt', 'r') >>> f.readline() 'apple\n' >>> f.readlines() ['orange\n', 'pear\n'] >>> f.readline() '' >>> f.close() >>> f = open('test.txt', 'r') >>> f.read() 'apple\norange\npear\n' >>> f.close() >>> f = open('test.txt') >>> line = f.readline() >>> while line: line = line.rstrip() print(line) line = f.readline() apple orange pear >>> f.close() Processing Text File Line-by-LineWe can use a with open('path/to/file.txt', 'r') as f: for line in f: line = line.strip() The try: f = open('path/to/file.txt') for line in f: line = line.strip() finally: f.close() Example: Line-by-line File CopyThe following script copies a file into another line-by-line, prepending each line with the line number.
Binary File Operations[TODO] Intro
For example [TODO] Directory and File ManagementIn Python,
directory and file management are supported by modules Path Operations Using Module os.pathIn Python, a path could refer to:
A path could be absolute (beginning with root) or relative to the current working directory (CWD). The path separator is platform-dependent (Windows use Checking Path Existence and Type
For examples, >>> import os >>> os.path.exists('/usr/bin') True >>> os.path.isfile('/usr/bin') False >>> os.path.isdir('/usr/bin') True Forming a New PathThe path separator is platform-dependent (Windows use
For examples, >>> import os >>> print(os.path.sep) / >>> print(os.path.join(os.path.sep, 'etc', 'apache2', 'httpd.conf')) /etc/apache2/httpd.conf >>> print(os.path.join('..', 'apache2', 'httpd.conf')) ../apache2/httpd.conf Manipulating Directory-name and Filename
For example, to form an absolute path of a file called os.path.join(os.path.dirname(os.path.abspath('in.txt')), 'out.txt') os.path.join(os.path.dirname('in.txt'), 'out.txt') For example, import os print('__file__:', __file__) print('dirname():', os.path.dirname(__file__)) print('abspath():', os.path.abspath(__file__)) print('dirname(abspath()):', os.path.dirname(os.path.abspath(__file__))) When a module is loaded in Python, $ python3 ./test_ospath.py $ python3 test_ospath.py $ python3 ../parent_dir/test_ospath.py $ python3 /path/to/test_ospath.py Handling Symlink (Unixes/Mac OS)
For example, import os print('__file__:', __file__) print('abspath():', os.path.abspath(__file__)) print('realpath():', os.path.realpath(__file__)) $ python3 test_realpath.py # Same output for abspath() and realpath() becuase there is no symlink $ ln -s test_realpath.py test_realpath_link.py $ python3 test_realpath_link.py #abspath(): /path/to/test_realpath_link.py #realpath(): /path/to/test_realpath.py (symlink resolved) Directory & File Managament Using Modules os and shutilThe modules However,
Directory Management
File Management
For examples [TODO], >>> import os >>> dir(os) ...... >>> help(os) ...... >>> help(os.getcwd) ...... >>> os.getcwd() ... current working directory ... >>> os.listdir() ... contents of current directory ... >>> os.chdir('test-python') >>> exec(open('hello.py').read()) >>> os.system('ls -l') >>> os.name 'posix' >>> os.makedir('sub_dir') >>> os.makedirs('/path/to/sub_dir') >>> os.remove('filename') >>> os.rename('oldFile', 'newFile') List a Directory
For examples, >>> import os >>> help(os.listdir) ...... >>> os.listdir() [..., ..., ...] >>> for f in sorted(os.listdir('/usr')): print(f) ...... >>> for f in sorted(os.listdir('/usr')): print(os.path.abspath(f)) ...... List a Directory Recursively via os.walk()
For example,
List a Directory Recursively via Module glob (Python 3.5)[TODO] Intro
Copying File
Shell Command [TODO]
Environment Variables [TODO]
fileinput ModuleThe import fileinput def main(): lineNumber = 0 for line in fileinput.input(): line = line.rstrip() lineNumber += 1 print('{}: {}'.format(lineNumber, line)) if __name__ == '__main__': main() Text ProcessingFor simple text string operations such as string search and replacement, you can use the built-in string functions (e.g.,
String OperationsThe built-in class Strip whitespaces (blank, tab and newline)
Uppercase/Lowercase
Find
For examples, >>> s = '/test/in.txt' >>> s.find('in') 6 >>> s[0 : s.find('in')] + 'out.txt' '/test/out.txt' Find and Replace
For examples, >>> s = 'hello hello hello, world' >>> help(s.replace) >>> s.replace('ll', '**') 'he**o he**o he**o, world' >>> s.replace('ll', '**', 2) 'he**o he**o hello, world' Split into Tokens and Join
For examples, >>> 'apple, orange, pear'.split() ['apple,', 'orange,', 'pear'] >>> 'apple, orange, pear'.split(', ') ['apple', 'orange', 'pear'] >>> 'apple, orange, pear'.split(', ', maxsplit=1) ['apple', 'orange, pear'] >>> ', '.join(['apple', 'orange, pear']) 'apple, orange, pear' Regular Expression in Module reReferences:
I assume that you are familiar with regex, otherwise, you could read:
The >>> import re >>> dir(re) ...... >>> help(re) ...... Backslash (\), Python Raw String r'...' vs Regular StringRegex's syntax uses backslash (
On the other hand, Python' regular strings also use backslash for escape sequences, e.g., To
write the regex pattern Python's solution is using raw string with a prefix Furthermore, Python denotes parenthesized back references (or capturing groups) as I suggest that you use raw strings for regex pattern strings and replacement strings. Compiling (Creating) a Regex Pattern Object
For examples, >>> import re >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> type(p1) Invoking Regex OperationsYou can invoke most of the regex functions in two ways:
Find using finaAll()
For examples, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> p1.findall('123 456') ['123', '456'] >>> p1.findall('abc') [] >>> p1.findall('abc123xyz456_7_00') ['123', '456', '7', '0', '0'] >>> re.findall(r'[1-9][0-9]*|0', '123 456') ['123', '456'] >>> re.findall(r'[1-9][0-9]*|0', 'abc') [] >>> re.findall(r'[1-9][0-9]*|0', 'abc123xyz456_7_00') ['123', '456', '7', '0', '0'] Replace using sub() and subn()
For examples, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> p1.sub(r'**', 'abc123xyz456_7_00') 'abc**xyz**_**_****' >>> p1.subn(r'**', 'abc123xyz456_7_00') ('abc**xyz**_**_****', 5) >>> p1.sub(r'**', 'abc123xyz456_7_00', count=3) 'abc**xyz**_**_00' >>> re.sub(r'[1-9][0-9]*|0', r'**', 'abc123xyz456_7_00') 'abc**xyz**_**_****' >>> re.sub(p1, r'**', 'abc123xyz456_7_00') 'abc**xyz**_**_****' >>> re.subn(p1, r'**', 'abc123xyz456_7_00', count=3) ('abc**xyz**_**_00', 3) >>> re.subn(p1, r'**', 'abc123xyz456_7_00', count=10) ('abc**xyz**_**_****', 5) Notes: For simple string replacement, use Using Parenthesized Back-References \1, \2, ... in Substitution and PatternIn Python, regex parenthesized back-references (capturing groups) are denoted as For examples, >>> re.sub(r'(\w+) (\w+)', r'\2 \1', 'aaa bbb ccc') 'bbb aaa ccc' >>> re.sub(r'(\w+) (\w+)', r'\2 \1', 'aaa bbb ccc ddd') 'bbb aaa ddd ccc' >>> re.subn(r'(\w+) (\w+)', r'\2 \1', 'aaa bbb ccc ddd eee') ('bbb aaa ddd ccc eee', 2) >>> re.subn(r'(\w+) \1', r'\1', 'hello hello world again again') ('hello world again', 2) Find using search() and Match Object
The
For example, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> inStr = 'abc123xyz456_7_00' >>> m = p1.search(inStr) >>> m <_sre.SRE_Match object; span=(3, 6), match='123'> >>> m.group() '123' >>> m.span() (3, 6) >>> m.start() 3 >>> m.end() 6 >>> m = p1.search(inStr, m.end()) >>> m <_sre.SRE_Match object; span=(9, 12), match='456'> >>> m = p1.search(inStr) >>> while m: print(m, m.group()) m = p1.search(inStr, m.end()) <_sre.SRE_Match object; span=(3, 6), match='123'> 123 <_sre.SRE_Match object; span=(9, 12), match='456'> 456 <_sre.SRE_Match object; span=(13, 14), match='7'> 7 <_sre.SRE_Match object; span=(15, 16), match='0'> 0 <_sre.SRE_Match object; span=(16, 17), match='0'> 0 To retrieve the back-references (or capturing groups) inside the Match object:
>>> p2 = re.compile('(A)(\w+)', re.IGNORECASE) >>> inStr = 'This is an apple.' >>> m = p2.search(inStr) >>> while m: print(m) print(m.group()) print(m.groups()) for idx in range(1, m.lastindex + 1): print(m.group(idx), end=',') print() m = p2.search(inStr, m.end()) <_sre.SRE_Match object; span=(8, 10), match='an'> an ('a', 'n') a,n, <_sre.SRE_Match object; span=(11, 16), match='apple'> apple ('a', 'pple') a,pple, Find using match() and fullmatch()
The For example, >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> m = p1.match('aaa123zzz456') >>> m >>> m = p1.match('123zzz456') >>> m <_sre.SRE_Match object; span=(0, 3), match='123'> >>> m = p1.fullmatch('123456') >>> m <_sre.SRE_Match object; span=(0, 6), match='123456'> >>> m = p1.fullmatch('123456abc') >>> m Find using finditer()
The >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> inStr = 'abc123xyz456_7_00' >>> p1.findall(inStr) ['123', '456', '7', '0', '0'] >>> for s in p1.findall(inStr): print(s, end=' ') 123 456 7 0 0 >>> for m in p1.finditer(inStr): print(m) <_sre.SRE_Match object; span=(3, 6), match='123'> <_sre.SRE_Match object; span=(9, 12), match='456'> <_sre.SRE_Match object; span=(13, 14), match='7'> <_sre.SRE_Match object; span=(15, 16), match='0'> <_sre.SRE_Match object; span=(16, 17), match='0'> >>> for m in p1.finditer(inStr): print(m.group(), end=' ') 123 456 7 0 0 Spliting String into Tokens
The >>> p1 = re.compile(r'[1-9][0-9]*|0') >>> p1.split('aaa123bbb456ccc') ['aaa', 'bbb', 'ccc'] >>> re.split(r'[1-9][0-9]*|0', 'aaa123bbb456ccc') ['aaa', 'bbb', 'ccc'] Notes: For simple delimiter, use Web ScrapingReferences:
Web Scraping (or web harvesting or web data extraction) refers to reading the raw HTML page to retrieve desired data. Needless to say, you need to master HTML, CSS and JavaScript. Python supports web scraping via packages requests and BeautifulSoup (bs4). Install PackagesYou could install the relevant packages using $ pip install requests $ pip install bs4 Step 0: Inspect the Target Webpage
Step 1: Send a HTTP GET request to the target URL to retrieve the raw HTML page using module requests>>> import requests >>> url = "http://your_target_webpage" >>> response = requests.get(url) >>> type(response) Step 2: Parse the HTML Text into a Tree-Structure using BeautifulSoup and Search the Desired Data>>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup(response.text, "html.parser") >>> type(soup) You could write out the selected data to a file: with open(filename, 'w') as fp: for row in rows: fp.wrire(row + '\n') You could also use >>> import csv >>> with open(filename, 'w') as fp: writer = csv.DictWriter(fp, ['colHeader1', 'colHeader2', 'colHeader3']) writer.writeheader() for row in rows: writer.writerow(row) Step 3: Download Selected Document Using urllib.requestYou may want to download documents such as text files or images. >>> import urllib.request >>> downloadUrl = '.....' >>> file = '......' >>> urllib.request.urlretrieve(download_url, file) Step 4: DelayTo avoid spamming a website with download requests (and flagged as a spammer), you need to pause your code for a while. >>> import time >>> time.sleep(1) REFERENCES & RESOURCES Which method is used to test a element is a file or directory?File isFile() method in Java with Examples
This function determines whether the is a file or Directory denoted by the abstract filename is File or not. The function returns true if the abstract file path is File else returns false.
Which of the following methods deletes a file or directory represented by an instance of Java io File class?static File createTempFile(String prefix, String suffix): As the name suggests this function creates a temporary file inside the default temporary file library. boolean delete(): This function deletes the directory or the file denoted by the abstract pathname. It returns false if an exception occurs.
Which method is used to check file existence?The exists() function is a part of the File class in Java. This function determines whether the is a file or directory denoted by the abstract filename exists or not. The function returns true if the abstract file path exists or else returns false. Parameters: This method does not accept any parameter.
Which of the following method automatically creates a new empty file named by this abstract pathname?createNewFile. Atomically creates a new, empty file named by this abstract pathname if and only if a file with this name does not yet exist.
|