多家對沖基金表示,向投資集團出售應用(APP)下載、信用卡購買交易等信息的“另類數(shù)據(jù)”產業(yè)在分享這些資料之前未能充分抹去個人資料。
The rapidly growing world of big data is seen as an increasingly attractive source of information for asset managers seeking a vital investment edge, with data providers selling everything from social media chatter and emailed receipts to federal lobbying data and even satellite images from space. But several hedge funds say some vendors are selling information that still contains sensitive personal information that could be used to identify individuals.
大數(shù)據(jù)的快速增長正被尋求關鍵投資優(yōu)勢的資產管理者視為一項越來越有吸引力的信息來源,而數(shù)據(jù)提供商可以出售一切信息——從社交媒體聊天記錄、用電郵發(fā)送的收據(jù),到聯(lián)邦游說數(shù)據(jù)、甚至從太空拍攝的衛(wèi)星圖像。但多家對沖基金表示,一些賣家出售的信息仍包含可被用于識別個人的敏感個人信息。
“The vendors claim to strip out all the personal information, but we occasionally find phone numbers, zip codes and so on,” said Matthew Granade, chief market intelligence officer at Steven Cohen’s Point72. “It’s a big enough deal that we have a couple of full-time tech people wash the data ourselves.”
“賣家聲稱抹去了所有個人信息,但我們偶爾會發(fā)現(xiàn)電話號碼、郵政編碼等信息,”史蒂文•科恩(Steven Cohen)旗下對沖基金Point72的首席市場情報官馬修•格拉內德(Matthew Granade)說。“問題相當嚴重,以至于我們有兩名專職技術人員負責清洗數(shù)據(jù)。”
The head of another big hedge fund said that even when personal information had been scrubbed from a data set, it was far too easy to restore.
另一家大型對沖基金負責人表示,即便從數(shù)據(jù)集中抹去個人信息,恢復這些信息也過于容易。
“We were shocked at how easy it was to de-anonymise the data,” he said. “It took one of my analysts 30 minutes to discover someone who was probably having an affair.”
“我們對于消除這些數(shù)據(jù)的匿名性的容易程度感到震驚,”他說,“我的一位分析師用了30分鐘就發(fā)現(xiàn)了一個很可能有婚外情的人。”
Sophisticated algorithms such as “machine learning” allow money managers to sift through enormous data sets for profitable patterns. But the sensitivity of some of the information being offered has raised concerns. Robert Schoshinski, assistant director in the Federal Trade Commission’s division of privacy and identity protection, said the issue was “on the FTC’s radar” but he refused to say if there were any open investigations into misuse of data.
“機器學習”等復雜算法使得資金管理人可以梳理海量數(shù)據(jù)集,從中發(fā)現(xiàn)有利可圖的規(guī)律。但是,一些被兜售的信息的敏感性引發(fā)了人們的擔憂。美國聯(lián)邦貿易委員會(Federal Trade Commission)隱私與身份保護部門助理主管Robert Schoshinski表示,聯(lián)邦貿易委員會已經注意到這一問題,但他拒絕透露是否已對數(shù)據(jù)濫用展開調查。
Tammer Kamel, chief executive of Quandl, a well-reputed alternative data vendor, said his company was “super zealous” about scrubbing any personal information out of its aggregated data. “No one wants to be on the wrong side of this,” he said.
聲譽良好的另類數(shù)據(jù)提供商Quandl首席執(zhí)行官塔默•卡邁勒(Tammer Kamel)表示,他的公司在從聚合數(shù)據(jù)抹去所有個人信息方面“超級較真”。“沒有人希望在這個問題上站在錯誤一邊,”他說。
Another hedge fund manager pointed out that if there were legal issues, the litigation axe would be more likely to fall on them than the data vendors. “We are incredibly careful about licensing and privacy issues because when things go wrong legally, the plaintiffs go after the people with the money,” he said.
另一名對沖基金經理指出,如果出現(xiàn)法律問題,訴訟的矛頭更有可能落在他們身上,而非數(shù)據(jù)提供商。“我們在授權和隱私問題上極為謹慎,因為如果出現(xiàn)法律問題,原告追究的必定是擁有資金的人,”他說。