Big Data for Business and Society

AI: Generative image and text models are very good. Generative Adversarial Networks takes two models, one produces deep fakes while the other tries to tell the difference between them and real photos Big Data: Text, images, audio, video etc. By 2025 every connected person – digital engagement 4900/day. 175 zettabytes of data. How is it stored? Can it be destroyed? How is it made? Privacy concerns? Is it consensual? Connectivity: Everything is now being connected and delivered by light. in 2018 half of humanity went online. Sending something to America takes 0.2secs. Contact and gain customers from anywhere in the world Software: More businesses and industries are being run on software and delivered as online services. Software can now be delivered widely at global scale. Five billion people have access to software (smartphones) Classical Data: Driven by a clear scientific theory or question. Costly and Hard. Historical approach to data science. Big Data: Passively collected (agents unaware of collection) Consistent (one measurement platform used – no human bias) Comprehensive (We know about everything in the scope) Volume (size/scale of info) Velocity (Speed at which data is created) Variety (Data is structured and unstructured with many types) Technology (Need for processing techniques) Methods (Structuring of unstructured data, model building, pattern finding) Gathers multiple pieces of data, not just data to confirm a specific question Challenges working with big data: Unstructured and messy. Also alternative data which is data from non-classical sources. Not collected like classical data i.e has no specific motive. Lack of protocol, methodology, historical record-keeping and clarity on how it is made Characteristics Important for Intelligent Life: 10 Metacomponents: Recognise problem existence, Recognise problem type, Select lower order actions needed, Form strategy around components, Select framework to represent information related to the problem, Decide how to allocate attention, Monitor performance, Judgement of quality of performance feedback, How to respond to feedback. Acting on feedback to proceed strategy. Performance components: skills, capabilities, functions. Knowledge acquisition component (learning new info and skills). If AI behaves the same way as humans it is intelligent Strong AI: Machines able to tackle any problem using their skills like a human can Weak AI: Able to solve problems in well defined domains Turing Test: AI and human both release outputs for a human observer to decide which is the AI and which is the human. If they cant be differentiated then the AI has passed the test and is deemed intelligent. Can be used to make AI better at their job. GLUE and superGLUE are datasets used for Turing tests with language models. AI learns from previous examples/shots called one two and three shot learning. Zero shot is ideal Core AI technologies: Penalised Regression (large number of covariates, penalties for using more) Support Vector Machines (seperates one set of data from the other into planes) Decision trees, Random Forest, Ensembles (if/then/else models. If X happens do Y else do Z, makes predictions through linking variables in previous examples) AI and Big Data Important for Eachother: AI models need to learn off training data provided by Big Data (more examples = volume, more up to date examples = velocity, more diverse examples = variety) Big Data need AI to turn its unstructured data into something understandable. RoboDebt: What made this settlement newsworthy was the scale of the numbers in this case – 400,000 people and $398 million in debt. The Australian Government’s Robo-debt automated the manual fraud detection process and removed some safeguards with a view to cracking down on compliance issues and reclaiming historical overpayments. As designed, the system was legally insecure Centrelink raised Amato’s debt based solely on averaging her income data and applied a 10 percent penalty to that amount (as it was entitled to do under the Social Security Act 1991). In 2016 the incumbent government announced that it would improve the budget position by $1.1 billion through a crackdown on welfare fraud. Identified compliance issues went from 20,000 in 2015/16 to 783,000 the next year. A woman had been ordered to pay back $4386.09 to Centrelink after discovering a discrepancy between her reported earnings and her ATO income for 2013. In 2013, Ms C was studying and working part time and receiving Austudy and newstart allowance. She told us she found it difficult to navigate the Office of the Ombudsman’s (OCI) system. After going through the OCI system, she was sent a demand for immediate payment in the form of a debt collection agency. A compliance officer incorrectly changed the online assessment to ‘completed’ on 12 December 2016. This resulted in Ms C receiving an incorrect letter advising she owed $0. Ms C eventually agreed to pay a lump sum of $500 and ongoing payments of $80 per week. DHS advised us that after a manual reassessment her debt was reduced to $507.55 on 17 January 2017. The Australian Ombudsman’s report into Centrelink’s new ‘Robo-debt’ system highlights a number of cases of people with little or no access to the system being left out of the loop. For Ms C, the mental and emotional damage far outweighed the mere waste of her time in trying to sort out her debt. Spotting potential compliance issues with averaged income data has promise, but it has only ever been an imperfect solution. It’s possible that a system like this can only work by paring back some individual protections such as secondary source checks and the assumption of innocence. Brick and Mortar Shopping: For customer there is anonymity, spatial constraint (cant view every product), info constraint (cant compare with other stores), generic store layout, generic advertising. For the store careful positioning needs to be considered, personal info gained only through loyalty programs, market segmentation almost impossible, market research is very costly. Privacy is very large while personalisation is near impossible Web Stores: Downloads info about you to personalise the experience – anonymity is gone. Item Based Collaborative Learning: Takes an item youve looked at andf gets features from other users who have looked at that item to predict the next item you will purchase. Keys: Every person has a key which is shared between websites to add and share information about yourself to help each website personalise your experience Trade Offs: judgements of personality made by algorithms are better than judgements made by close others. > 300 likes on facebook and the algorithm knows you better than your spouse. Privacy is non-existent/tiny while personalisation is complete. Do users have a right to privacy online? Who controls the algorithms? What accountability do they have?



Facebook: Facebook and Google had become a “digital duopoly. “The Wall Street Journal reported that Facebook and Google together accounted for nearly half of global spending on digital advertising and 77% of growth in the U.S. online advertising market in 2016.27 They also had tremendous influence with users, particularly in the U.S., where 66% of the population got their news from Facebook.28 One author called them “Attention Merchants,” referring to how they essentially monetized consumers’ attention. The campaign claimed to use the data in the 2016 primaries, but not the general election.43 In March 2018, reporters broke the story of Cambridge Analytica’s mass collection and distribution of Facebook user data. When asked to comment on Facebook’s privacy policy, Apple CEO Tim Cook stated, “Privacy is a human right to us. ” Although the Cambridge Analytica revelation ignited public discourse around user privacy rights, news sources reported this kind of data mining for several years. It was also not the first time Facebook had been used to influence politics. In addition to dealing with privacy violations, social media platforms faced calls to both moderate content and not moderate content. Although mooted in some circles as a way to better align user interests with Facebook’s, there was some debate around changing the Facebook business model to a subscription-based approach. While much of the recent drama impacting Facebook was centered on U.S. elections, almost 90% of Facebook’s 2.2 billion users were outside of North America.104 Some were skeptical that a new business model approach would work Nearly two-thirds of the 33,000 respondents of a U.K. 2018 survey felt that online companies were not regulated enough, lacked transparency, and sold user data inappropriately.113 More than half felt the companies preyed on user loneliness, and a third thought social media was not a force for good insociety. 125 Unlike in other industries, an independent review board for social media and digital content had yet to emerge. Zuckerberg, in a Washington Post op-ed, called for more regulation, imploringthe government to focus on “new rules for the internet,” primarily around four topics: privacy, harmful content, election integrity, and data portability.131 Facebook, along with Google, Apple, and Amazon, had significantly beefed up their U.S. lobbying resources, reportedly spending over $50 million in 2017 to lobby Congress. In the early phases of the 2020 U.S. presidential election, candidate Elizabeth Warren called on the FTC to enforce existing anti-trust laws to “break up Big Tech.” The Cambridge Analytica crisis caused Facebook’s market capitalization to drop by nearly $100 billion, or 18%, over 10 days in late March 2018. After the Cambridge Analytica scandal, only seven of Facebook’s top 1,000 ad spenders had halted their spending, and two did so for reasons unrelated to Cambridge Analytica.163 Like Procter & Gamble, some brands cut back spending as they discovered they were serving ads to “bots” (webbased software applications running repetitive tasks) rather than actual people. With lower profitability forecast and growth moderating, Facebook’s stock plummeted more than 17%, wiping out nearly $130 billion in market value, the largest single-day drop ever for a public company.