murl

Introduction

About

murl is a URI manipulation module aimed at web use. Motivated by the Python URI Quora challenge (see note), which is why it has its own implementations of functions already available in urllib and similar components of the Python Standard Library.

The idea is to parse an existing URI into a Murl object and manipulate its components flexibly according to the standards mentioned in RFC 3986 and other relevant documentation (details on that in the rules below), or alternatively, create an empty Murl object and add/change components on the fly.

Logo Credit

Note

As of sometime between October 13th and October 29th, 2016, this challenge was removed. There’s a copy of the prompt, however, in my blog post.

Rules

General URI syntax:

scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]
  • Scheme must start with a letter, followed by letters, digits, +, ., or -, and then a colon.
  • Authority part which has an optional username/password, a host, and an optional port has to start with // and end with the end of the URI, a /, a ?, or a #.
  • Path must begin with / whenever there’s an authority present. Can begin with / if not, but never //.
  • Query must begin with ? and is a string of key=value pairs delimetered by & or ; usually.
  • Keys in Query can be duplicates and indicate multiple values for the same thing.
  • Fragment must begin with # and span until the end of the URI.
  • All unsafe characters and unreserved characters in any given URI component must be percent-encoded in % HEXDEG HEXDIG format.
  • Domains (without subdomains, etc) must consist of one segment that (might start with and) ends with a . plus a public suffix, where a public suffix may have a number of dot-delimeterd segments and a wide range of lengths itself (see Public Suffix list).

Installation

1. Get the package:

$ pip install mrf-murl

2. Import it in your program/script:

from mrf_murl import Murl

General Use

1. Create a Murl object using an existing relative or absolute valid URI

foo = Murl('https://test.me?this=that')

or without any parameters for an empty object where you create componenets on the fly:

bar = Murl()
  1. Add/change/get URI components using the object’s parameters.

Below is a quick/non-comprehensive example (note that values that don’t exist will return None, or raise an error. See comprehensive docs):

# Entire assembled URI
# ---------------------
print(foo) # or use str(foo)

# Scheme
# ------
## Get
scheme = foo.scheme # returns a string, e.g. 'https'
## Set
foo.scheme = 'http'

# Authority
# ---------
## Host
## ====
### Get
host = foo.host
### Set
foo.host = 'google.com'

## Authentication (username/password)
## ==================================
### Get (Either together or individually)
auth = foo.auth # returns a dict with the keys 'username' and 'password'
username = foo.username # returns a string
password = foo.password # returns a string
### Set (Either together or individually)
foo.auth = dict(username='scott', password='tiger')
foo.username = 'scott'
foo.password = 'tiger'

## Port
## ====
### Get
port = foo.port # returns an int
### Set
foo.port = 25

# Path
# ----
## Get
path = foo.path # returns string
## Set
foo.path = '/more.html' # does not have to start with /.
# If / is required and not present, it will be added to the assembled URI.

# Query
# -----
## Get
querystring = foo.queryString
singleQuery = foo.getQuery('this') # returns the value of the query, decoded
## Set
foo.addQuery('this', 'those') # will add key=value pair even if key exists
foo.changeQuery('this', 'not') # will delete all prev key values & add this val
foo.removeQuery('this') # will remove all values for this key
# see docs to change/remove one key=value pair even if key has mult values

# Fragment
# --------
## Get
fragment = foo.fragment # returns a string
## Set
foo.fragment = 'hello'

print(foo) # http://scott:tiger@google.com:25/more.html#hello

Docs

Murl

The URI object class.

class mrf_murl.Murl(url='', queryDelim='&')

Initialize an instance of Murl with the following optional params:

  • url: String with URI, default: empty string.
  • queryDelim: char to separate key=value pairs, default: &,
    recommended: & or ;.

Initialization can raise a ValueError if:

  • There is an ‘@’ in an existing Authority without there being both a
    username:password pair.
  • There is no host specified in an existing Authority.
  • Host has imbalanced IPv6 brackets.
  • Port number is not between 1 and 65535 inclusive.
  • There is more than one colon in Authority outside username:password
    and without the host being IPv6.
The entire assembled URI, as a str, is available through the standard
str() function.

All properties, if not already set, will return/be None on get.

addQuery(key, val, spaceIsPlus=True)

Add a single key=value pair to the Query part. Function will detect if key/value is already encoded. Params:

  • key (str): key for this pair. Can be existing key to add to its
    values.
  • val (str): value for this pair.
  • spaceIsPlus (bool): Optional. True if space should be encoded as
    • instead of %20.
auth

Get or set authentication part as a dict with the keys ‘username’ and ‘password’. Raises ValueError on set if:

  • Value is not a dict with the keys ‘username’ and ‘password’.

On set, detects if username/password are already percent-encoded.

changeQuery(key, newVal, val=None, spaceIsPlus=True)
Change one or all values for a specific key. Raises a KeyError if:
  • Key does not exist in the URI’s Query.
And a ValueError if:
  • A value is specified but it does not exist for this key.

Function detects if key/values are already encoded. Params:

  • key (str): key whose value(s) you want to change.
  • newVal (str): new value instead of the pre-existing value(s).
  • val (str): Old value if only one value is to be changed instead
    of all.
  • spaceIsPlus (bool): Optional. True if space should be encoded as
    • instead of %20.
domain
Get domain only of the URI.
If IPv4/6 or no registered public suffix is found, domain = host. This assumes the longest matching public suffix is the host’s public suffix. For example:
::
amazon.com.mx

would match .mx, and .com.mx. The longer is .com.mx, therefore it’s assumed as the public suffix.

fragment

Get or set the fragment of the URI. Returns a decoded fragment on get. On set, detects if fragment is already percent encoded.

getQuery(key, decodeVal=True, spaceIsPlus=True)
Get list of values for a given key. Raises a KeyError if:
  • Key does not exist in this URI’s Query.

Function will detect if key is already encoded. Params:

  • key (str): key whose values you want as a list.
  • decodeVal (bool): Optional. True if values are to be returned
    decoded.
  • spaceIsPlus (bool): Optional. True if space should be encoded as
    • instead of %20.
host

Get or set host. Raises a ValueError on set if:

  • Value has imbalanced IPv6 brackets (‘[‘ but no following ‘]’,

or vice versa)

password

Get or set password individually. Raises a ValueError on set if:

  • Host has not yet been set.
  • Username is not already set.

On set, detects if password is already percent-encoded.

path

Get or set path. Returns ValueError on set if:

  • Path starts with two forward slashes.

On set, detects if path is already percent-encoded. On get, unlike other components, the encoded path is returned.

port

Get or set port number. Raises a ValueError on set if:

  • Host has not yet been set.
  • Port is not an int between 1 and 65535, inclusive.
queryDelim

Get or set the current query delimeter.

queryString

Get assembled querystring for the URI.

removeQuery(key, val=None, spaceIsPlus=True)
Remove one or all values for a specific key. Raises a KeyError if:
  • Key does not exist in the URI’s Query.
And a ValueError if:
  • A value is specified but it does not exist for this key.

Function detects if key/value are already encoded. Params:

  • key (str): key whose value(s) you want to delete.
  • val (str): Optional. Current value if only one value is to be
    deleted instead of all.
  • spaceIsPlus (bool): Optional. True if space should be encoded
    as + instead of %20.
scheme

Get or set the current scheme. Raises a ValueError on set if:

  • Scheme doesn’t comply with standard format
    (letter that can be followed by letters, digits, +, ., or -).
username

Get or set username individually. Raises a ValueError on set if:

  • Host has not yet been set.
  • Password is not already set.

On set, detects if username is already percent-encoded.

URL Section (urlsec)

mrf_murl.urlsec.assembleURL(urldict, queryDelim='&')

Re-assemble a URL dict divided by this module, or uses same syntax. Params:

  • urldict (dict with keys as the ones returned by divideURL()).
  • queryDelim (str): Optional. Delimeter for the query key=value pairs.
    Recommended to be ‘&’ or ‘;’.
mrf_murl.urlsec.divideURL(url, queryDelim='&')

Divides a URL into a dict with the following keys, whose values are an empty str if optional and not set:

  • scheme (str)
  • authority (dict. Keys: username, password, host, port)
  • path (str)
  • query (dict. Keys are query keys = list of their values)
  • fragment (str)

Can raise a ValueError if:

  • There is an ‘@’ in an existing Authority without there being both a
    username:password pair.
  • There is no host specified in an existing Authority.
  • Host has imbalanced IPv6 brackets.
  • Port number is not between 1 and 65535 inclusive.
  • There is more than one colon in Authority outside username:password
    and without the host being IPv6.

Params:

  • url (str): the URI to be divided.
  • queryDelim (str): Optional. Delimeter for the query key=value pairs.
    Recommended: ‘&’ or ‘;’.

URL Encode/Decode (urlende)

mrf_murl.urlende.SAFE_CHARS = 'ABCDEFGHIJKLMNOPQRSTUVWSYZabcdefghijklmnopqrstuvwxyz0192837465-._~'

Characters that are always URI-safe.

mrf_murl.urlende.decode(part)

Decode a part of a URI which has already been percent-encoded. Params:

  • part (str): percent-encoded URI component.
mrf_murl.urlende.decode_query(part, plus=True)

Decode something that is part of a query (key, or value) or path. Params:

  • part (str): part to be decoded.
  • plus (bool): Optional. True if space is + instead of %20.
mrf_murl.urlende.encode(part, safe='/')

Percent-encode a component of a URI. Params:

  • part (str): component of the URI.
  • safe (str, put together if more than 1 char):
    Optional. Char(s) not to encode along with SAFE_CHARS. Reserved chars, for example.
mrf_murl.urlende.encode_query(part, plus=True, safe='')

Encode something that is part of a query (key, or value). Params:

  • part (str): key or value.
  • plus (bool): Optional.
    True: if space should be encoded as + instead of %20.
  • safe (str): Optional. Char(s) not to encode.