AI, Big Data, and the Future of Connectivity
AI: Generative Image and Text Models
Generative image and text models are highly effective. Generative Adversarial Networks (GANs) utilize two models: one that produces deep fakes and another that distinguishes between these fakes and real photos.
Big Data: Characteristics and Concerns
Big Data encompasses text, images, audio, video, and more. By 2025, each connected person is projected to have a digital engagement 4900 times per day, accumulating 175 zettabytes of data. Key questions arise:
- How is this data stored?
- Can it be destroyed?
- How is it created?
- What are the privacy concerns?
- Is data collection consensual?
Connectivity: The Global Network
Everything is now connected and delivered via light. In 2018, half of humanity went online. Sending data to America takes just 0.2 seconds, enabling contact and customer acquisition from anywhere in the world.
Software: The Digital Transformation
More businesses and industries are run on software and delivered as online services. Software can now be widely distributed at a global scale, with five billion people having access via smartphones.
Classical Data vs. Big Data
- Classical Data: Driven by a clear scientific theory or question. It is costly and difficult to obtain, representing the historical approach to data science.
- Big Data: Passively collected, often without the agents’ awareness. It is consistent (using one measurement platform, minimizing human bias) and comprehensive (covering everything within scope).
Big Data Characteristics
- Volume: Size and scale of information.
- Velocity: Speed at which data is created.
- Variety: Data is both structured and unstructured.
- Technology: Need for advanced processing techniques.
- Methods: Structuring unstructured data, model building, and pattern finding.
Big Data gathers multiple pieces of information, not just data to confirm a specific question.
Challenges with Big Data
Big Data is often unstructured and messy. Alternative data, sourced from non-classical origins, lacks a specific motive, protocol, methodology, and historical record-keeping.
Characteristics of Intelligent Life
Ten metacomponents are crucial for intelligent life:
- Recognize problem existence.
- Recognize problem type.
- Select lower-order actions.
- Form a strategy.
- Select a framework for information representation.
- Decide how to allocate attention.
- Monitor performance.
- Judge the quality of performance feedback.
- Respond to feedback.
- Act on feedback to proceed with the strategy.
Performance components include skills, capabilities, and functions. The knowledge acquisition component involves learning new information and skills. If AI behaves similarly to humans, it is considered intelligent.
Strong AI vs. Weak AI
- Strong AI: Machines can tackle any problem using their skills, similar to humans.
- Weak AI: Machines can solve problems in well-defined domains.
The Turing Test
In the Turing Test, an AI and a human release outputs for a human observer to differentiate. If they cannot be distinguished, the AI is deemed intelligent. Datasets like GLUE and SuperGLUE are used for Turing tests with language models. AI learns from previous examples (one, two, and three-shot learning), with zero-shot being the ideal.
Core AI Technologies
- Penalized Regression: Handles a large number of covariates with penalties for using more.
- Support Vector Machines (SVM): Separates data into planes.
- Decision Trees, Random Forest, Ensembles: If/then/else models that make predictions by linking variables from previous examples.
AI and Big Data Interdependence
AI models require training data from Big Data to learn effectively. More examples (volume), up-to-date examples (velocity), and diverse examples (variety) improve AI performance. Big Data needs AI to transform unstructured data into understandable insights.
The RoboDebt Scandal
The RoboDebt settlement involved 400,000 people and $398 million in debt. The Australian Government automated the manual fraud detection process, removing safeguards to reclaim historical overpayments. The system was legally insecure, with Centrelink averaging income data and applying a 10% penalty. In 2016, the government announced a crackdown on welfare fraud, increasing identified compliance issues from 20,000 in 2015/16 to 783,000 the following year. Cases highlighted included incorrect debt assessments and inadequate communication, causing significant distress. The Ombudsman’s report revealed systemic issues, emphasizing the emotional and mental toll on affected individuals.
Brick and Mortar vs. Web Stores
Brick and Mortar Shopping:
- Customer anonymity
- Spatial constraints (limited product view)
- Information constraints (difficulty comparing with other stores)
- Generic store layout and advertising
- Stores face challenges in positioning and gathering personal information, relying on loyalty programs.
- Market segmentation is nearly impossible, and market research is costly.
- High privacy, low personalization.
Web Stores:
- Downloads user information to personalize the experience, eliminating anonymity.
Item-Based Collaborative Learning
This method takes an item you’ve viewed and uses features from other users who have looked at the same item to predict your next purchase.
Keys
Every person has a key shared between websites to add and share information, helping each site personalize your experience.
Trade-Offs
Algorithmic judgments of personality can be more accurate than those made by close others. With over 300 Facebook likes, an algorithm can know you better than your spouse. Privacy is minimal, while personalization is complete. Key questions include:
- Do users have a right to privacy online?
- Who controls the algorithms?
- What accountability do they have?
Facebook: Challenges and Controversies
Facebook and Google have formed a “digital duopoly,” accounting for nearly half of global digital advertising spending and 77% of growth in the U.S. online advertising market in 2016. They significantly influence users, with 66% of the U.S. population getting news from Facebook. The Cambridge Analytica scandal involved mass collection and distribution of Facebook user data, sparking public discourse on user privacy rights. Social media platforms face calls to both moderate and not moderate content. Debates continue on changing Facebook’s business model to a subscription-based approach. Almost 90% of Facebook’s 2.2 billion users are outside North America. A 2018 U.K. survey revealed that nearly two-thirds of 33,000 respondents felt online companies were under-regulated, lacked transparency, and misused user data. Zuckerberg called for more regulation, focusing on privacy, harmful content, election integrity, and data portability. Facebook, Google, Apple, and Amazon spent over $50 million in 2017 lobbying Congress. The Cambridge Analytica crisis caused Facebook’s market capitalization to drop by nearly $100 billion. Despite this, few advertisers halted spending. However, with lower profitability forecasts, Facebook’s stock plummeted by over 17%, wiping out nearly $130 billion in market value.