One of the key metrics for advertising companies is conversion rates. This might be the percentage of people who click on an ad or install an app. Targeted ads are commonly used to identify the right audience to display an ad to, thus increasing the likelihood that they get converted.
So how do advertisers know who to target? To do so requires a great deal of knowledge about an individual. This might be gathered by tracking the apps and websites visited, one’s daily routines, and the network of friends on social media. I think many people put off privacy because its benefits are not immediate nor obvious. One (hypothetical) malicious use is if an insurance company discriminates prices based one’s usage pattern.
Nonetheless there are a few valid use cases for limited tracking. For example services want to identify users who try to reinstall an app to get a new free trial or replace their previous fraudulent account. Or perhaps companies want to give users a discount for upgrading to a new version or when buying multiple apps1.
How tracking works
Suppose someone builds a tracking library (Triple Tap seems fitting) and convinces many apps to include it. What are some ways it can amalgamate user sessions across apps? Nowadays everyone uses a NAT router so there might be hundreds or even thousands of users behind a single IP address. The easiest way to distinguish users is to generate a UUID on startup. However, since mobile applications are sandboxed and cannot communicate with other apps, the library must rely on some information from the operating system to ensure consistency between apps. Assuming this library is included in a lot of mobile apps, it is then possible to build a list of apps used by a single user.
[iOS 2 - 6] UDID
Historically iOS provided a convenient method
[[UIDevice currentDevice] uniqueIdentifier].
A unique device identifier is a hash value composed from various hardware identifiers such as the device’s serial number. It is guaranteed to be unique for every device but cannot publicly be tied to a user account. […]
Unfortunately there are several downsides of this identifier from a privacy standpoint. Although this cannot publicly be tied to a user account, many apps do require users to sign in, thus allowing them to establish a link. Furthermore, since this is based on the physical device identifier, the same identifier will be generated even after you sell or recycle your device. Luckily the general community has become a lot more privacy conscious and this was deprecated in iOS 5 and removed by iOS 7.
[iOS 6 - Present] IDFA
In iOS 6 Apple introduced Identifier for Advertisers (IDFA) to replace the deprecated UDID. This identifier was created specifically for advertising and tracking and provided some benefits for privacy such as the ability for users to generate a new identifier. It also provided an option for users to request limited ad-tracking which, similar to the Do Not Track HTTP header, only informs that user do not want to be tracked. It isn’t until iOS 10 that this identifier will return nil when the option is enabled.
[iOS 6 - Present] identifierForVendor
At the same time, Apple introduced an API specifically allowing apps from a developer to obtain the same user identifier from all installed apps on a device published by that developer. Tracking libraries cannot distinguish users using this method because apps from different developers will return different identifiers.
[iOS 2 - 11] MAC Addresses
When UDID was replaced by IDFA, some people were just not content with an identifier that could be reset. Thus many applications began using the wifi MAC address instead. This was short-lived as iOS 7 began returning a constant
02:00:00:00:00:00. Nonetheless it was still possible to use the ARP table to retrieve the MAC address from the wifi router, at least until this was removed iOS 11! Such a classic game of cat and mouse.
[iOS 3 - 8] canOpenURL
Although this API was available since the early iPhone OS era, it seems like only in the last couple of years have apps adopted custom URLs as a priority feature. This makes sense because, with custom URL schemes and the ability to query for its existence, apps can be smarter by properly opening that app, or bringing up the App Store sheet. But who would have thought that any company would have the audacity to scan all the apps on one’s phone? This was addressed by limiting the number of queries and by requiring apps to specify URL schemes they intend to query ahead of time so it can get reviewed2.
[iOS 8 - Present] iCloud Keychain
iCloud Keychain was introduced as users began using multiple devices and as Apple improved its cloud infrastructure. In the simplest form, one can think of it as a key-value dictionary that is synchronized with Apple’s servers and the rest of the user’s devices. However the side effect of this persistence is that data stored on servers will not be deleted, even if the user has deleted the app from all of their devices. This is not to say that Apple is not aware of this issue - it’s just that privacy leakage and the goal of persistence for apps are inherently irreconcilable. In fact iOS 10 betas initially changed the behaviour to perform the deletion, but was reverted due to compatibility problems3. The bright side is that this only allows persistently storing data, but apps still need to find a way of communicating with other apps.
[iOS 11 - Present] DeviceCheck
This provides developers with the ability to verify the authenticity of an Apple device and to store two bits of information on a per-device, per-developer basis. This is now the preferred way to record characteristics about a user such as flagging fraudulent users and free trial usage in a privacy-preserving manner. A lot of thought has been put into this. For example although it provides a queryable timestamp (of when bits are updated), this is trimmed to provide only year and month granularity. This gives 828 (69 years * 12 months) possibilities if we assume date ranges can be between January 1970 - January 2030. However this can easily be (if not already) constrained to only accept timestamps within the last few months.
User tracking is common practice today, mostly for ad targeting. To do so ad tracking libraries must be able to distinguish users behind NAT routers, which may be in the hundreds or thousands. Furthermore apps must rely on the operating system in order to acquire a consistent identifier between apps. Historically iOS provided several convenient APIs to do so, however these identifiers could identify users and persisted between installs.
iOS removed many sources of persistent identifiers and replaced them with privacy-respecting alternatives that provide most, if not all, of the necessary functionality. Nonetheless there are still other sources, albeit small (eg. carrier name, device name, model, version), which together can be used to fingerprint a device. Maybe we should all go back to using a Blackberry with separate work/personal profiles?
Note: All the iOS versions affected signified by the square brackets are inclusive.
3 ↩ Ironic anecdote: Deleting iCloud Keychain probably would have prevented crashes on Frenzy if users had just restored their phone. This was due to the app trying to failing to decrypt the database since it was using the key from the previous phone (backed up) on a newly created database (not backed up).
A selection of other sneaky methods: