There is the saying, ‘If it works, don’t touch it!’ I like it, but sometimes changes could be requested by someone from the outside, and if that ‘someone’ is as big as Apple, we have to listen.
Recently, we decided to update some code related to URL encoding (frankly speaking, we should have addressed it earlier, but you know how things go…). The Apple compiler asked as to update the code with the following warning:
The code piece that caused this warning was the following:
(__bridge_transfer NSString*)CFURLCreateStringByAddingPercentEscapes(NULL,
(CFStringRef)self, NULL, (CFStringRef)@”!*’\”();:@&=+$,/?%#[]% “,
CFStringConvertNSStringEncodingToEncoding(encoding))
(By the way, this was a code from one of Google’s libraries.)
The easiest ‘fix’ to silence deprecation with no effort would be this:
NSString *blackList = @”!*’\”();:@&=+$,/?%#[]% ” NSCharacterSet *allowedCharacters = [NSCharacterSet characterSetWithCharactersInString:blackList].invertedSet; NSString *encodedURL = [url stringByAddingPercentEncodingWithAllowedCharacters:allowedCharacters]
However, this wasn’t a perfect solution in our case. This is because our app had to be able to handle any (valid) URL. The app doesn’t compose a URL through components but instead receives it as input (by decoding QR-codes). Furthermore, with knowledge of the existence of internationalised domains, it’s better to operate with the allowed character set than with a black list of characters. Thus, in our case, we should use (NSString *)stringByAddingPercentEncodingWithAllowedCharacters:(NSCharacterSet *)allowedCharacters.
What should the allowedCharacters presumably have been? There is a bunch of standard character sets prefixed with ‘URL’ that could have been used for this purpose:
URLHostAllowedCharacterSet
URLPasswordAllowedCharacterSet
URLPathAllowedCharacterSet
URLQueryAllowedCharacterSet
URLUserAllowedCharacterSet
Should we have used a union of all of these or some of these as a superset for others? We like to have things clear, so we decided to check, and with a simple playground, we tried to compare them.
The result:
urlQueryAllowed superset for urlUserAllowed urlQueryAllowed NOT superset for urlHostAllowed urlQueryAllowed superset for urlPathAllowed urlQueryAllowed superset for urlFragmentAllowed urlQueryAllowed superset for urlPasswordAllowed
It looked like the only missing part was URLHostAllowedCharacterSet. We added it and checked once again:
let allowedCharacters = NSMutableCharacterSet() allowedCharacters.formUnion(with: .urlQueryAllowed) allowedCharacters.formUnion(with: .urlHostAllowed)
And now it’s a ‘full house’:
allowedCharacters superset for urlUserAllowed allowedCharacters superset for urlHostAllowed allowedCharacters superset for urlPathAllowed allowedCharacters superset for urlFragmentAllowed allowedCharacters superset for urlPasswordAllowed allowedCharacters superset for urlQueryAllowed
We got our answer: the union of urlQueryAllowed and urlHostAllowed contained all the characters.
In general, you must use allowed character sets specifically for every part of URL/URI. For example, apply the urlPasswordAllowed subset only for user credentials (if you pass them) and the urlQueryAllowed subset only for a query part.
Happy URL encoding!
More to read: