Tag Archives: Data
MOSCOW/TORONTO (Reuters) – Moscow-based Kaspersky Lab plans to open a data center in Switzerland to address Western government concerns that Russia exploits its anti-virus software to spy on customers, according to internal documents seen by Reuters.
Kaspersky is setting up the center in response to actions in the United States, Britain and Lithuania last year to stop using the company’s products, according to the documents, which were confirmed by a person with direct knowledge of the matter.
The action is the latest effort by Kaspersky, a global leader in anti-virus software, to parry accusations by the U.S. government and others that the company spies on customers at the behest of Russian intelligence. The U.S. last year ordered civilian government agencies to remove the Kaspersky software from their networks.
Kaspersky has strongly rejected the accusations and filed a lawsuit against the U.S. ban.
The U.S. allegations were the “trigger” for setting up the Swiss data center, said the person familiar with Kapersky’s Switzerland plans, but not the only factor.
“The world is changing,” they said, speaking on condition of anonymity when discussing internal company business. “There is more balkanisation and protectionism.”
The person declined to provide further details on the new project, but added: “This is not just a PR stunt. We are really changing our R&D infrastructure.”
A Kaspersky spokeswoman declined to comment on the documents reviewed by Reuters.
In a statement, Kaspersky Lab said: “To further deliver on the promises of our Global Transparency Initiative, we are finalizing plans for the opening of the company’s first transparency center this year, which will be located in Europe.”
“We understand that during a time of geopolitical tension, mirrored by an increasingly complex cyber-threat landscape, people may have questions and we want to address them.”
Kaspersky Lab launched a campaign in October to dispel concerns about possible collusion with the Russian government by promising to let independent experts scrutinize its software for security vulnerabilities and “back doors” that governments could exploit to spy on its customers.
The company also said at the time that it would open “transparency centers” in Asia, Europe and the United States but did not provide details. The new Swiss facility is dubbed the Swiss Transparency Centre, according to the documents.
Work in Switzerland is due to begin “within weeks” and be completed by early 2020, said the person with knowledge of the matter.
The plans have been approved by Kaspersky Lab CEO and founder Eugene Kaspersky, who owns a majority of the privately held company, and will be announced publicly in the coming months, according to the source.
“Eugene is upset. He would rather spend the money elsewhere. But he knows this is necessary,” the person said.
It is possible the move could be derailed by the Russian security services, who might resist moving the data center outside of their jurisdiction, people familiar with Kaspersky and its relations with the government said.
Western security officials said Russia’s FSB Federal Security Service, successor to the Soviet-era KGB, exerts influence over Kaspersky management decisions, though the company has repeatedly denied those allegations.
The Swiss center will collect and analyze files identified as suspicious on the computers of tens of millions of Kaspersky customers in the United States and European Union, according to the documents reviewed by Reuters. Data from other customers will continue to be sent to a Moscow data center for review and analysis.
Files would only be transmitted from Switzerland to Moscow in cases when anomalies are detected that require manual review, the person said, adding that about 99.6 percent of such samples do not currently undergo this process.
A third party will review the center’s operations to make sure that all requests for such files are properly signed, stored and available for review by outsiders including foreign governments, the person said.
Moving operations to Switzerland will address concerns about laws that enable Russian security services to monitor data transmissions inside Russia and force companies to assist law enforcement agencies, according to the documents describing the plan.
The company will also move the department which builds its anti-virus software using code written in Moscow to Switzerland, the documents showed.
Kaspersky has received “solid support” from the Swiss government, said the source, who did not identify specific officials who have endorsed the plan.
Reporting by Jack Stubbs in Moscow and Jim Finkle in Toronto; Editing by Jonathan Weber
SAN FRANCISCO (Reuters) – Facebook Inc faced new calls for regulation from within U.S. Congress and was hit with questions about personal data safeguards on Saturday after reports a political consultant gained inappropriate access to 50 million users’ data starting in 2014.
Facebook disclosed the issue in a blog post on Friday, hours before media reports that conservative-leaning Cambridge Analytica, a data company known for its work on Donald Trump’s 2016 presidential campaign, was given access to the data and may not have deleted it.
The scrutiny presented a new threat to Facebook’s reputation, which was already under attack over Russians’ alleged use of Facebook tools to sway American voters before and after the 2016 U.S. elections.
“It’s clear these platforms can’t police themselves,” Democratic U.S. Senator Amy Klobuchar tweeted.
“They say ‘trust us.’ Mark Zuckerberg needs to testify before Senate Judiciary,” she added, referring to Facebook’s CEO and a committee she sits on.
Facebook said the root of the problem was that researchers and Cambridge Analytica lied to it and abused its policies, but critics on Saturday threw blame at Facebook as well, demanding answers on behalf of users and calling for new regulation.
Facebook insisted the data was misused but not stolen, because users gave permission, sparking a debate about what constitutes a hack that must be disclosed to customers.
“The lid is being opened on the black box of Facebook’s data practices, and the picture is not pretty,” said Frank Pasquale, a University of Maryland law professor who has written about Silicon Valley’s use of data.
Pasquale said Facebook’s response that data had not technically been stolen seemed to obfuscate the central issue that data was apparently used in a way contrary to the expectations of users.
“It amazes me that they are trying to make this about nomenclature. I guess that’s all they have left,” he said.
Democratic U.S. Senator Mark Warner said the episode bolstered the need for new regulations about internet advertising, describing the industry as the “Wild West.”
“Whether it’s allowing Russians to purchase political ads, or extensive micro-targeting based on ill-gotten user data, it’s clear that, left unregulated, this market will continue to be prone to deception and lacking in transparency,” he said.
With Republicans controlling the Senate’s majority, though, it was not clear if Klobuchar and Warner would prevail.
The New York Times and London’s Observer reported on Saturday that private information from more than 50 million Facebook users improperly ended up in the hands of Cambridge Analytica, and the information has not been deleted despite Facebook’s demands beginning in 2015.
Some 270,000 people allowed use of their data by a researcher, who scraped the data of all their friends as well, a move allowed by Facebook until 2015. The researcher sold the data to Cambridge, which was against Facebook rules, the newspapers said.
Cambridge Analytica worked on Trump’s 2016 campaign. A Trump campaign official said, though, that it used Republican data sources, not Cambridge Analytica, for its voter information.
Facebook, in a series of written statements beginning late on Friday, said its policies had been broken by Cambridge Analytica and researchers and that it was exploring legal action.
Cambridge Analytica in turn said it had deleted all the data and that the company supplying it had been responsible for obtaining it.
Andrew Bosworth, a Facebook vice president, hinted the company could make more changes to demonstrate it values privacy. “We must do better and will,” he wrote on Twitter, adding that “our business depends on it at every level.”
Facebook said it asked for the data to be deleted in 2015 and then relied on written certifications by those involved that they had complied.
Nuala O’Connor, president of the Center for Democracy & Technology, an advocacy group in Washington, D.C., said Facebook was relying on the good will of decent people rather than preparing for intentional misuse.
Moreover, she found it puzzling that Facebook knew about the abuse in 2015 but did not disclose it until Friday. “That’s a long time,” she said.
Britain’s data protection authority and the Massachusetts attorney general on Saturday said they were launching investigations into the use of Facebook data.
“It is important that the public are fully aware of how information is used and shared in modern political campaigns and the potential impact on their privacy,” UK Information Commissioner Elizabeth Denham said in a statement.
Massachusetts Attorney General Maura Healey’s office said she wants to understand how the data was used, what policies if any were violated and what the legal implications are.
Reporting by David Ingram; Editing by Peter Henderson and Chris Reese
(Reuters) – Facebook Inc on Friday said it was suspending political data analytics firm Cambridge Analytica, which worked for President Donald Trump’s 2016 election campaign, after finding data privacy policies had been violated.
Facebook said in a statement that it suspended Cambridge Analytica and its parent group Strategic Communication Laboratories (SCL) after receiving reports that they did not delete information about Facebook users that had been inappropriately shared.
Cambridge Analytica was not immediately available for comment. Facebook did not mention the Trump campaign or any political campaigns in its statement, attributed to company Deputy General Counsel Paul Grewal.
“We will take legal action if necessary to hold them responsible and accountable for any unlawful behavior,” Facebook said, adding that it was continuing to investigate the claims.
Cambridge Analytica worked for the failed presidential campaign of U.S. Senator Ted Cruz and then for the presidential campaign of Donald Trump. On its website, it says it “provided the Donald J. Trump for President campaign with the expertise and insights that helped win the White House”.
Brad Parscale, who ran Trump’s digital ad operation in 2016 and is his 2020 campaign manager, declined to comment on Friday.
In past interviews with Reuters, Parscale has said that Cambridge Analytica played a minor role as a contractor in the 2016 Trump campaign, and that the campaign used voter data from a Republican-affiliated organization rather than Cambridge Analytica.
Facebook’s Grewal said the company was taking the unusual step of announcing the suspension “given the public prominence” of Cambridge Analytica and its parent organization.
The suspension means Cambridge Analytica and SCL cannot buy ads on the world’s largest social media network or administer pages belonging to clients, Andrew Bosworth, a Facebook vice president, said in a Twitter post.
Trump’s campaign hired Cambridge Analytica in June 2016 and paid it more than $ 6.2 million, according to Federal Election Commission records.
Cambridge Analytica says it uses “behavioral microtargeting”, or combining analysis of people’s personalities with demographics, to predict and influence mass behavior. It says it has data on 220 million Americans, two thirds of the U.S. population.
It has worked on other campaigns in the United States and other countries, and it is funded by Robert Mercer, a prominent supporter of politically conservative groups.
Facebook in its statement described a rocky relationship with Cambridge Analytica and two individuals going back to 2015.
That year, Facebook said, it learned that University of Cambridge professor Aleksandr Kogan lied to the company and violated its policies by sharing data that he acquired with a so-called “research app” that used Facebook’s login system.
Kogan was not immediately available for comment.
The app was downloaded by about 270,000 people. Facebook said that Kogan gained access to profile and other information “in a legitimate way” but “he did not subsequently abide by our rules” when he passed the data to SCL/Cambridge Analytica and Christopher Wylie of Eunoia Technologies. (bit.ly/2FZU1Ir)
Eunoia did not immediately respond to a request for comment.
Facebook said it cut ties to Kogan’s app when it learned of the violation in 2015, and asked for certification from Kogan and all parties he had given data to that the information had been destroyed.
Although all certified that they had destroyed the data, Facebook said that it received reports in the past several days that “not all data was deleted”, prompting the suspension announced on Friday.
Additional Reporting by Ismail Shakil in Bengaluru; Editing by Jonathan Weber, Leslie Adler and Joseph Radford
Most of the questions surrounding the coming age of driverless cars pertain to practical things: regulation, insurance, training protocols for the cars’ remote human backups. Some are philosophical: What do we owe the people whose jobs will be annihilated? Do robo cars need ethics lessons? At least one question is practical and philosophical: How do we know when these things are ready to ditch their human safety drivers and roll about unattended?
No one has much of a response. You could say that as soon as the robot is safer than the average human driver—who crashes once every 238,000 miles or so—it’s wrong to keep it in the lab. Or you can argue that robo cars ought to be held to higher standards: Should they be 10 times better than the human? 1,000 times? Whatever the answer is, data will help us get there. And so we turn to the California DMV’s 2017 Autonomous Vehicle Disengagement Reports.
The Golden State, home to many of the companies leading the robo revolution, has some of the strictest rules for AVs in the country. Operators who run cars on public roads must publicly report any crashes they’re involved in. And at the end of every year, they must hand over data on how many miles they drove and how many times their onboard human safety driver had to take control from the machine—that’s called a disengagement. Combine those, and you have a number approximating how far any company’s self-driving car can go without human help. Something like a grade.
The metric is imperfect, and this data comes with a crate of caveats. But before we get into those, know this: Waymo (formerly known as Google’s self-driving car project) and General Motors appear to be leading the pack and making rapid progress toward the day when human drivers, with all their inattention and distraction and tendency to crash, will be obsolete.
Ifs and Buts
You can read more about the shortcomings of disengagement reports here, but here’s the quick rundown:
- They’re unscientific, because each company reports its data in a different way, offering various levels of detail and idiosyncratic explanations for what triggered the human takeover.
- They’re packed with vague language and lack context. Delphi cites “cyclist” as the reason for a bunch of disengagements. Zoox blamed every disengagement on a “planning discrepancy” or “hardware discrepancy.”
- They’re little use for anyone who wants to compare rival companies, because those companies aren’t running the same tests: Waymo does most of its testing in simple suburbs; GM focuses on the complex city. They’re better for tracking the progress of each outfit, but still not great, because those companies change how and where they test over time.
- A disengagement does not mean the car was going to crash, only that the human driver wasn’t 100 percent confident in how it would behave.
- They only cover driving on public roads in California. So we don’t know anything about Ford, which focuses its testing around Detroit and Pittsburgh. We don’t see data for Waymo’s increasingly important test program in Phoenix—where its cars are tooling about without anyone inside.
On the other hand, the disengagement reports are the best data we’ve got for evaluating these development efforts. No state but California demands anything like this, and private companies only share such info when the government demands it.
So, let’s sprinkle some grains of salt on the numbers and take a look. We broke them down into a pair of two-axis charts. The first looks at Waymo and General Motors. It notes how many miles they drove in 2016 and 2017 (in green) and how many miles they averaged between disengagements (in blue). (By the way, Uber didn’t have to file a report, because this data isn’t required until your first full calendar year of testing. Uber didn’t get its permit to test in California until March of 2017.)
The takeaway here is that Waymo’s software remains excellent, and it’s still doing tons of testing in California. For GM, you can see a huge ramp-up in miles driven, and a steep increase in miles per disengagement. That’s progress, and it’s a good thing: GM plans to launch a car without a steering wheel or pedals next year. Keep in mind that GM does nearly all its public street testing in San Francisco, a much more complicated environment than Palo Alto and Mountain View, where Waymo works.
Next, we have the data for Delphi (now known as Aptiv), Nissan, Mercedes-Benz, and Zoox, a San Francisco–based startup working to build a self-driving vehicle that looks nothing like today’s cars—not that it will say anything more than that for the time being. Each has a serious program, but they do so much less testing than Waymo and GM that we put them in their own chart. (Otherwise, the scales would just be totally out of proportion to each other.)
More caveats: Mercedes-Benz may not look so hot in California, but that data’s from just three vehicles. It does much more work in Europe: In 2017, it sent an autonomous S-Class on a five-month tour of five continents. Nissan does a lot of testing at NASA’s Ames Research Center, which doesn’t count as public land, so doesn’t require data reporting. And to get the most interesting bit of data from Zoox, you have to dive into its report.
In its first year of testing (thus the lack of 2016 numbers), it drove just over 100 miles through August. Over the next three months, it drove about 2,000. Yet its rate of disengagements remained steady. Overall, it averaged 160 miles per disengagement. But if you look at just November, when it was doing lots of testing in downtown San Francisco, that number jumps to 430. Even with the caveats, it’s a clear sign that Zoox is making impressive progress—and that more than one of these students is getting ready to throw on a gown, grab its diploma, and give you a ride.
SAN FRANCISCO/WASHINGTON (Reuters) – A 20-year-old Florida man was responsible for the large data breach at Uber Technologies Inc last year and was paid by Uber to destroy the data through a so-called “bug bounty” program normally used to identify small code vulnerabilities, three people familiar with the events have told Reuters.
Uber announced on Nov. 21 that the personal data of 57 million passengers and 600,000 drivers were stolen in a breach that occurred in October 2016, and that it paid the hacker $ 100,000 to destroy the information. But the company did not reveal any information about the hacker or how it paid him the money.
Uber made the payment last year through a program designed to reward security researchers who report flaws in a company’s software, these people said. Uber’s bug bounty service – as such a program is known in the industry – is hosted by a company called HackerOne, which offers its platform to a number of tech companies.
Reuters was unable to establish the identity of the hacker or another person who sources said helped him. Uber spokesman Matt Kallman declined to comment on the matter.
Newly appointed Uber Chief Executive Dara Khosrowshahi fired two of Uber’s top security officials when he announced the breach last month, saying the incident should have been disclosed to regulators at the time it was discovered, about a year before.
It remains unclear who made the final decision to authorize the payment to the hacker and to keep the breach secret, though the sources said then-CEO Travis Kalanick was aware of the breach and bug bounty payment in November of last year.
Kalanick, who stepped down as Uber CEO in June, declined to comment on the matter, according to his spokesman.
A payment of $ 100,000 through a bug bounty program would be extremely unusual, with one former HackerOne executive saying it would represent an “all-time record.” Security professionals said rewarding a hacker who had stolen data also would be well outside the normal rules of a bounty program, where payments are typically in the $ 5,000 to $ 10,000 range.
HackerOne hosts Uber’s bug bounty program but does not manage it, and plays no role in deciding whether payouts are appropriate or how large they should be.
HackerOne CEO Marten Mickos said he could not discuss an individual customer’s programs. “In all cases when a bug bounty award is processed through HackerOne, we receive identifying information of the recipient in the form of an IRS W-9 or W-8BEN form before payment of the award can be made,” he said, referring to U.S. Internal Revenue Service forms.
According to two of the sources, Uber made the payment to confirm the hacker’s identity and have him sign a nondisclosure agreement to deter further wrongdoing. Uber also conducted a forensic analysis of the hacker’s machine to make sure the data had been purged, the sources said.
One source described the hacker as “living with his mom in a small home trying to help pay the bills,” adding that members of Uber’s security team did not want to pursue prosecution of an individual who did not appear to pose a further threat.
The Florida hacker paid a second person for services that involved accessing GitHub, a site widely used by programmers to store their code, to obtain credentials for access to Uber data stored elsewhere, one of the sources said.
GitHub said the attack did not involve a failure of its security systems. “Our recommendation is to never store access tokens, passwords, or other authentication or encryption keys in the code,” that company said in a statement.
‘SHOUT IT FROM THE ROOFTOPS’
Uber received an email last year from an anonymous person demanding money in exchange for user data, and the message was forwarded to the company’s bug bounty team in what was described as Uber’s routine practice for such solicitations, according to three sources familiar with the matter.
Bug bounty programs are designed mainly to give security researchers an incentive to report weaknesses they uncover in a company’s software. But complicated scenarios can emerge when dealing with hackers who obtain information illegally or seek a ransom.
Some companies choose not to report more aggressive intrusions to authorities on the grounds that it can be easier and more effective to negotiate directly with hackers in order to limit any harm to customers.
Uber’s $ 100,000 payout and silence on the matter at the time was extraordinary under such a program, according to Luta Security founder Katie Moussouris, a former HackerOne executive.
“If it had been a legitimate bug bounty, it would have been ideal for everyone involved to shout it from the rooftops,” Moussouris said.
Uber’s failure to report the breach to regulators, even though it may have felt it had dealt with the problem, was an error, according to people inside and outside the company who spoke to Reuters.
“The creation of a bug bounty program doesn’t allow Uber, their bounty service provider, or any other company the ability to decide that breach notification laws don’t apply to them,” Moussouris said.
Uber fired its chief security officer, Joe Sullivan, and a deputy, attorney Craig Clark, over their roles in the incident.
“None of this should have happened, and I will not make excuses for it,” Khosrowshahi, said in a blog post announcing the hack last month.
Clark worked directly for Sullivan but also reported to Uber’s legal and privacy team, according to three people familiar with the arrangement. It is unclear whether Clark informed Uber’s legal department, which typically handled disclosure issues.
Sullivan and Clark did not respond to requests for comment.
In an August interview with Reuters, Sullivan, a former prosecutor and Facebook Inc (FB.O) security chief, said he integrated security engineers and developers at Uber “with our lawyers and our public policy team who know what regulators care about.”
Last week, three more top managers in Uber’s security unit resigned. One of them, physical security chief Jeff Jones, later told others he would have left anyway, sources told Reuters. Another of the three, senior security engineer Prithvi Rai, later agreed to stay in a new role.
Reporting by Joseph Menn in San Francisco and Dustin Volz in Washington; Additional reporting by Heather Somerville and Stephen Nellis in San Francisco; Editing by Jonathan Weber and Bill Rigby
KUALA LUMPUR (Reuters) – Malaysia’s CIMB Group Holdings Bhd on Monday said some magnetic tapes containing backup customer data were lost during routine operations, adding that there has been no evidence so far that any data has been compromised.
The tapes do not contain any authentication data such as pin numbers, passwords or credit card security numbers, the country’s second biggest lender said in a statement.
“Several magnetic tapes containing back-up data were physically lost in transit during routine operations. Some of these tapes contain customer information of CIMB Bank and its subsidiaries,” it said.
“Following a thorough and ongoing assessment, there is currently no evidence that any of this information has been compromised.”
The bank said it was working with relevant authorities and taking steps to protect customers. It did not say when the tapes were lost.
CIMB said it has heightened security measures following the loss of the tapes, including temporarily suspending some services via its call center.
In a separate statement, Malaysia’s central bank said it has been assured by CIMB that “necessary precautionary measures and mitigation actions have been taken to manage any possible negative impact arising from the loss of the tapes.”
Earlier this month, Malaysia said it was investigating an alleged attempt to sell data of more than 46 million mobile phone subscribers online, in what appeared to be one of the largest leaks of customer data in Asia.
Reporting by A. Ananthalakshmi, editing by David Evans
Late in 2015, Gilberto Titericz, an electrical engineer at Brazil’s state oil company Petrobras, told his boss he planned to resign, after seven years maintaining sensors and other hardware in oil plants. By devoting hundreds of hours of leisure time to the obscure world of competitive data analysis, Titericz had recently become the world’s top-ranked data scientist, by one reckoning. Silicon Valley was calling. “Only when I wanted to quit did they realize they had the number-one data scientist,” he says.
Petrobras held on to its champ for a time by moving Titericz into a position that used his data skills. But since topping the rankings that October he’d received a stream of emails from recruiters around the globe, including representatives of Tesla and Google. This past February, another well-known tech company hired him, and moved his family to the Bay Area this summer. Titericz described his unlikely journey recently over colorful plates of Nigerian food at the headquarters of his new employer, Airbnb.
Titericz earned, and holds, his number-one rank on a website called Kaggle that has turned data analysis into a kind of sport, and transformed the lives of some competitors. Companies, government agencies, and researchers post datasets on the platform and invite Kaggle’s more than one million members to discern patterns and solve problems. Winners get glory, points toward Kaggle’s rankings of its top 66,000 data scientists, and sometimes cash prizes.
Alone and in small teams with fellow Kagglers, Titericz estimates he has won around $ 100,000 in contests that included predicting seizures from brainwaves for the National Institutes of Health, the price of metal tubes for Caterpillar, and rental property values for Deloitte. The TSA and real-estate site Zillow are each running competitions offering prize money in excess of $ 1 million.
Veteran Kagglers say the opportunities that flow from a good ranking are generally more bankable than the prizes. Participants say they learn new data-analysis and machine-learning skills. Plus, the best performers like the 95 “grandmasters” that top Kaggle’s rankings are highly sought talents in an occupation crucial to today’s data-centric economy. Glassdoor has declared data scientist the best job in America for the past two years, based on the thousands of vacancies, good salaries, and high job satisfaction. Companies large and small recruit from Kaggle’s fertile field of problem solvers.
In March, Google came calling and acquired Kaggle itself. It has been integrated into the company’s cloud-computing division, and begun to emphasize features that let people and companies share and test data and code outside of competitions, too. Google hopes other companies will come to Kaggle for the people, code, and data they need for new projects involving machine learning—and run them in Google’s cloud.
Kaggle grandmasters say they’re driven as much by a compulsion to learn as to win. The best take extreme lengths to do both. Marios Michailidis, a previous number one now ranked third, got the data-science bug after hearing a talk on entrepreneurship from a man who got rich analyzing trends in horseraces. To Michailidis, the money was not the most interesting part. “This ability to explore and predict the future seemed like a superpower to me,” he says. Michailidis taught himself to code, joined Kaggle, and before long was spending what he estimates was 60 hours a week on contests—in addition to a day job. “It was very enjoyable because I was learning a lot,” he says.
Michailidis has since cut back to roughly 30 hours a week, in part due to the toll on his body. Titericz says his own push to top the Kaggle rankings, made not long after the birth of his second daughter, caused some friction with his wife. “She’d get mad with me every time I touched the computer,” he says.
Entrepreneur SriSatish Ambati has made Kagglers a core strategy of his startup, H2O, which makes data-science tools for customers including eBay and Capital One. Ambati hired Michailidis and three other grandmasters after he noticed a surge in downloads when H2O’s software was used to win a Kaggle contest. Victors typically share their methods in the site’s busy forums to help others improve their technique.
H2O’s data celebrities work on the company’s products, providing both expertise and a marketing boost akin to a sports star endorsing a sneaker. “When we send a grandmaster to a customer call their entire data-science team wants to be there,” Ambati says. “Steve Jobs had a gut feel for products; grandmasters have that for data.” Jeremy Achin, cofounder of startup DataRobot, which competes with H2O and also has hired grandmasters, says high Kaggle rankings also help weed out poseurs trying to exploit the data-skills shortage. “There are many people calling themselves data scientists who are not capable of delivering actual work,” he says.
Competition between people like Ambati and Achin helps make it lucrative to earn the rank of grandmaster. Michailidis, who works for Mountain View, California-based H2O from his home in London, says his salary has tripled in three years. Before joining H2O, he worked for customer analytics company Dunnhumby, a subsidiary of supermarket Tesco.
Large companies like Kaggle champs, too. An Intel job ad posted this month seeking a machine-learning researcher lists experience winning Kaggle contests as a requirement. Yelp and Facebook have run Kaggle contests that dangle a chance to interview for a job as a prize for a good finish. The winner of Facebook’s most recent contest last summer was Tom Van de Wiele, an engineer for Eastman Chemical in Ghent, Belgium, who was seeking a career change. Six months later, he started a job at Alphabet’s artificial-intelligence research group DeepMind.
H2O is trying to bottle some of the lightning that sparks from Kaggle grandmasters. Select customers are testing a service called Driverless AI that automates some of a data scientist’s work, probing a dataset and developing models to predict trends. More than 6,000 companies and people are on the waitlist to try Driverless. Ambati says that reflects the demand for data-science skills, as information piles up faster than companies can analyze it. But no one at H2O expects Driverless to challenge Titericz or other Kaggle leaders anytime soon. For all the data-crunching power of computers, they lack the creative spark that makes a true grandmaster.
“If you work on a data problem in a company you need to talk with managers, and clients,” says Stanislav Semenov, a grandmaster and former number one in Moscow, who is now ranked second. He likes to celebrate Kaggle wins with a good steak. “Competitions are only about building the best models, it’s pure and I love it.” On Kaggle, data analysis is not just a sport, but an art.
SINGAPORE/BANGKOK (Reuters) – When diaper maker DSG International (Thailand) wants to know what its customers are thinking, it often turns to Lazada, an e-commerce firm majority-owned by Alibaba Group Holding (BABA.N).
“From (their) data, we know mothers sometimes browse at night, so we can offer flash sales when we know customers are browsing,” says Ambrose Chan, the Thai company’s CEO.
Southeast Asia is the world’s fastest-growing internet market, home to 600 million consumers from Vietnam to Indonesia via Singapore, many of them tech- and social media-savvy. They are rapidly spending more time and money online. A Nielsen study in 2015 estimated Southeast Asia’s middle-class will hit 400 million by 2020, doubling from 2012.
Gross merchandise value of ecommerce in Southeast Asia will balloon to $ 65.5 billion by 2021, from $ 14.3 billion last year, predicts consultancy Frost & Sullivan.
Research firm Euromonitor forecasts internet retailing in Indonesia, for example, will more than double to $ 6.2 billion by 2021, and Thailand will increase 85 percent to $ 2.8 billion.
(For a graphic on Southeast Asia internet sales click reut.rs/2l3qULe)
Consumer goods firms, such as Unilever (UNc.AS) and Japanese cosmetics firm Shiseido (4911.T), say the e-commerce boom allows them to push deeper into markets that can otherwise be difficult to understand and tough to penetrate due to poor retail networks and infrastructure.
“Data from Lazada has been used to position certain products where consumer preferences are different. For example, Thai customers like to buy diapers in special cartons, while Malaysians prefer multiple packs,” says Chan.
To reach more customers and get a better handle on their online behavior, consumer goods companies are forging partnerships with e-commerce firms like Lazada and fashion website Zalora.
A customer who clicked on a 50 milliliter product may instead buy a smaller 30 ml product, said Pranay Mehra, vice president, digital and e-commerce at Shiseido Asia Pacific, noting that data and online selling experience can help firms bundle offers, decide on packaging and distribution, and influence where to set up a physical presence.
“This data is very powerful and very insightful, if used properly,” Mehra added.
Unilever, whose products range from Hellmann’s mayonnaise to Dove soap, said it is seeing more demand from rural consumers in developing markets like Indonesia and Vietnam.
“With all our e-commerce partners, we’re using data to help us find innovative solutions to unlock key barriers of high cost delivery and poor credit card penetration in remote areas,” said Anusha Babbar, e-commerce director at Unilever Southeast Asia and Australasia.
The conglomerate, which works with the likes of Singapore online grocer RedMart, Indonesia’s Blibli and Vietnam’s Tiki, said it introduced its St Ives skincare brand on Lazada after seeing a trend towards natural products and shopper search data.
DATA AND LOGISTICS
“Traditional retailers will struggle to see customer behavior,” said Lazada Thailand’s CEO, Alessandro Piscini. “We can tell if a customer is pregnant from their search behavior.”
Lazada, he said, plans to use data science to help its merchants customize offers for specific customer groups based on age, gender and other preferences.
Zalora, which sells clothing and accessories online in markets including Singapore, Malaysia and Indonesia, said it was working on ad-hoc projects with some brands to help them understand their customers based on data.
Lazada and Zalora are among the few e-commerce platforms that operate in multiple Southeast Asian countries. But the region is becoming a new battleground as Amazon (AMZN.O) and JD.com (JD.O) make beachheads in Singapore and Thailand.
Lazada Thailand will focus on partnering with fast-moving consumer goods companies to maintain its lead, Piscini said, and is expanding its logistics footprint across a region that has poor roads, clogged cities and thousands of often remote islands.
To be sure, online still contributes a tiny portion to consumer goods companies’ sales, but some local firms are going beyond partnerships and investing in their own e-commerce capabilities.
Thailand’s top consumer goods manufacturer Saha Group (SPI.BK) (SPC.BK) has seen online sales of some of its brands rise tenfold since it began a partnership with Lazada in June, but online still represents just 1-2 percent of total sales.
Saha is using e-commerce data to customize offerings.
“We now make real-time offerings to customers. Before, promotions would be seasonal,” Chairman Boonsithi Chokwatana told Reuters.
The company, whose products include instant noodles, toothpaste and laundry detergent, is investing 2 billion baht ($ 60 million) in logistics to support its e-commerce ambitions, including a 21-storey warehouse and a big data team, he said.
Reporting by Aradhana Aravindan in SINGAPORE and Chayut Setboonsarng in BANGKOJK; Editing by Ian Geoghegan
The Department of Homeland Security is proposing to expand the files it collects on immigrants, as well as some citizens, by including more online data—most notably search results and social media information—about each individual.
The plan, which would cover data like Facebook posts or Google results, is set out in the Federal Register, where the government publishes forthcoming regulations. A final version is set to go into effect on Oct. 18.
The plan, reported by BuzzFeed, is notable partly because it permits the government to amass information not only about recent immigrants, but also on green card holders and naturalized Americans as well.
The proposal to collect social media data is set out in a part of the draft regulation that describes expanding the content of so-called “Alien Files,” which serve as detailed profiles of individual immigrants, and are used by everyone from border agents to judges. Here is the relevant portion:
The Department of Homeland Security, therefore, is updating the [file process] to … (5) expand the categories of records to include the following: country of nationality; country of residence; the USCIS Online Account Number; social media handles, aliases, associated identifiable information, and search results
The proposal follows new rules by the Trump Administration that require visitors from certain countries to disclose their social media handles, and allow border agents to view their list of phone contacts.
Those earlier measures alarmed civil rights advocates who questioned whether they would do much to improve security, and worried other countries would introduce similar screening of Americans. In response to the latest effort to collect social media data, the American Civil Liberties Union warned of a “chilling effect.”
Get Data Sheet, Fortune’s technology newsletter.
“This Privacy Act notice makes clear that the government intends to retain the social media information of people who have immigrated to this country, singling out a huge group of people to maintain files on what they say. This would undoubtedly have a chilling effect on the free speech that’s expressed every day on social media,” the group said in a statement.
The new rules are currently subject to a comment period until Oct. 18 but, if they go into effect as planned, they will add yet more data to “Alien Files” that can already contain information such as fingerprints, travel histories, and health, and education records.
Such repositories provide powerful intelligence-gathering tools, but brings potential privacy risks such as government surveillance or cyber-attacks.
Dubai Electricity and Water Authority (DEWA)’s data hub subsidiary, Data Hub Integrated Solutions LLC (MORO), has signed a Memorandum of …