Mabble Rabble: distributed systems

Showing posts with label distributed systems. Show all posts

8 July 2025

Centralization to Self-Sovereignty with AI Agents

The internet has undergone a remarkable transformation, evolving through distinct phases, each redefining how we interact with information and each other. From the static pages of Web1 to the dynamic experiences of Web2, the decentralized promise of Web3, and the identity-centric vision of Web5, this progression fundamentally reshapes digital possibilities, with agentic solutions playing an increasingly pivotal role.

Web2: The Social and Interactive Web

Web2, often termed the "Social Web," emerged in the early 2000s, shifting from static content to user-generated content and interactive experiences. Platforms like Facebook, YouTube, and Amazon exemplify Web2, characterized by centralized control, rich user interfaces, and the rise of social media. Users could create, share, and collaborate, but their data and digital identities largely remained under the control of the platform providers.

Best Use Cases: Social networking, e-commerce, blogging, SaaS applications, online collaboration tools.
Agentic Solutions: In Web2, AI agents often function as sophisticated automation tools. For instance, customer service chatbots handle inquiries, content moderation bots filter inappropriate material, and data aggregation agents analyze user behavior to personalize advertisements or recommend content. These agents typically operate within the confines of a single platform, leveraging centralized data stores to perform their tasks.

Web3: The Decentralized and Ownership-Driven Web

Web3 represents a paradigm shift towards decentralization, powered primarily by blockchain technology. Its core tenets include user ownership of data and digital assets, censorship resistance, and transparent, immutable transactions. Cryptocurrencies, Non-Fungible Tokens (NFTs), Decentralized Finance (DeFi), and Decentralized Autonomous Organizations (DAOs) are hallmarks of Web3, aiming to reduce reliance on intermediaries and empower individual users.

Best Use Cases: Decentralized finance (lending, borrowing), digital collectibles (NFTs), blockchain-based gaming (GameFi), decentralized governance (DAOs), and verifiable digital identity.
Agentic Solutions: AI agents in Web3 can interact directly with smart contracts and decentralized protocols. This includes automated trading bots on decentralized exchanges, governance bots that facilitate voting in DAOs, and agents that manage and verify digital assets. For implementation, developers write agents that connect to blockchain nodes (e.g., via Web3.js or Ethers.js), execute transactions, and interact with smart contract APIs, often leveraging decentralized storage solutions for their operational data.

Web5: The Decentralized Web with Personal Data Control

Still largely conceptual but rapidly gaining traction, Web5 is less about a new blockchain and more about a layer built atop existing decentralized technologies, specifically focusing on decentralized identity and personal data ownership. Pioneered by Jack Dorsey's TBD, Web5 envisions a web where users truly own their identity and control their data, rather than having it reside with third-party applications. It aims to empower individuals with Self-Sovereign Identity (SSI) and Decentralized Web Platforms (DWPs) that store personal data securely, allowing users to grant granular access permissions.

Best Use Cases: Self-sovereign digital identity, verifiable credentials (e.g., digital driver's licenses, academic degrees), secure personal data storage, privacy-preserving data sharing for personalized services without relinquishing control.
Agentic Solutions: AI agents in Web5 are designed with privacy and user control at their forefront. They can act as personal data guardians, managing access to a user's decentralized identity and data stores based on explicit consent. For implementation, these agents would utilize emerging Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) standards. An agent might, for instance, automatically present a verifiable age credential to a service without revealing the user's full date of birth, or grant a health app temporary access to specific fitness data, all while the user retains ultimate control over their data's lifecycle.

Solid from Tim Berners-Lee: A Personal Data Store Vision

Solid, an initiative by World Wide Web inventor Tim Berners-Lee, offers a distinct approach to data ownership. It proposes that individuals store their personal data in decentralized data stores called "Pods" (Personal Online Data Stores). Users control who can access their data and how it's used, effectively decoupling data from applications. Applications then request permission to read or write data to a user's Pod.

Web2, Web3, Web5 vs. Solid: Key Differences

While all these concepts aim for a better internet, their approaches differ. Web2 is centralized, with platforms owning data. Web3 introduces decentralization primarily through blockchain for digital assets and transactions, where data ownership is often tied to blockchain addresses. Web5 builds on decentralization, specifically emphasizing self-sovereign identity and personal data control, often leveraging DIDs and VCs. Solid, on the other hand, focuses on a more direct model of personal data storage in Pods, where users maintain direct control over their data's location and access, regardless of the underlying technology (blockchain or otherwise). While Web3 and Web5 often rely on blockchain for trust and immutability, Solid's core innovation is the Pod, which can theoretically exist on various decentralized storage solutions, not exclusively blockchain. Web5's emphasis on DIDs and VCs aligns closely with Solid's goals of user-controlled identity and verifiable data.

In essence, the evolution from Web2 to Web3 and Web5 reflects a continuous drive towards greater user empowerment and decentralization. Agentic solutions, from centralized automation to decentralized identity managers, are crucial enablers at each stage, transforming how we interact with the web and how our digital lives are managed. Solid provides a complementary, highly focused vision for personal data control within this evolving ecosystem.

16 May 2025

Supercomputers and Future of Computing

The concept of a supercomputer once evoked images of vast, power-hungry machines housed in specialized facilities. These behemoths of computing were the domain of governments, research institutions, and large corporations. However, the relentless march of technological progress, particularly in quantum computing and nanotechnology, is poised to revolutionize this landscape. The future may hold a world where the power of a supercomputer is accessible in a device no larger than a household appliance, transforming how we live, work, and interact with technology.

The key to this transformation lies in the convergence of two groundbreaking fields: quantum computing and nanotechnology. Quantum computing harnesses the bizarre principles of quantum mechanics to perform calculations that are impossible for even the most powerful classical computers. While still in its nascent stages, quantum computing holds the potential to solve complex problems in fields like drug discovery, materials science, and artificial intelligence, problems that currently strain our computational capabilities.

Nanotechnology, on the other hand, deals with manipulating matter at the atomic and molecular scale. This ability to engineer materials and devices at such a minute level opens the door to incredible miniaturization. Imagine transistors, the fundamental building blocks of computers, shrunk down to the size of a few atoms. This level of miniaturization, combined with novel materials, could dramatically increase computing power while significantly reducing size and energy consumption.

The implications of bringing this level of computing power into the home are staggering. A miniature, quantum-powered supercomputer could become the central hub for a vast array of applications.

Data Centers in a Box: The need for massive, centralized data centers could be reduced, or at least augmented, by distributed computing power within homes. Imagine a network of home-based supercomputers contributing to global research efforts or providing localized, secure data storage.
AI Modeling Unleashed: Complex AI models that currently require immense processing power could be developed and run locally. This would democratize AI research and development, allowing individuals and small businesses to create sophisticated applications. Imagine personalized AI tutors, advanced home automation systems that learn and adapt to your every need, and highly sophisticated creative tools.
Revolutionizing Numerical Computation: Fields that rely heavily on complex simulations, such as weather forecasting, financial modeling, and engineering design, would be transformed. Imagine highly accurate, real-time simulations available to anyone, enabling better predictions, optimized designs, and a deeper understanding of complex systems.
Beyond the Imagination: These miniature supercomputers could also power applications we haven't even conceived of yet. The availability of such immense computational power could spark a new era of innovation, leading to breakthroughs in fields we can only dream of today.

However, significant challenges remain before this vision can become a reality. Building stable and scalable quantum computers is a formidable task, and integrating them with nanotechnological components presents even greater hurdles. Issues such as error correction, thermal management, and the development of quantum-specific software need to be addressed. Furthermore, the cost of such technology would need to decrease dramatically to make it accessible to the average household.

Despite these challenges, the potential rewards are too great to ignore. The development of miniature, quantum-powered supercomputers could usher in a new era of computing, characterized by decentralization, accessibility, and unprecedented power. This technology has the potential to transform our lives in profound ways, empowering individuals, driving innovation, and unlocking new frontiers of knowledge. As research in quantum computing and nanotechnology progresses, the dream of a supercomputer in every home may be closer than we think.

22 April 2025

Polars

16 February 2025

Rust Language

Is Unsafe an Achilles' Heel?

Rust Mixed-Methods Study

12 February 2025

Rust Sucks

Rust is metaphorically marketed as a better systems programming language that focuses on performance-sensitive applications, especially for memory safety and concurrency. But, all of this is not very transparent, inaccessible, and hidden from the developer. Not to mention, it all comes with a steep learning curve. So, the question to ask, is it really worth it?

Ownership and Borrowing: This feature helps prevent memory leaks and data races. But, it is difficult to understand and profile especially if you are used to garbage collection.

Lifetimes: This helps ensure memory safety. But, again it is complex and difficult to reason about.

Complex Type System: It comes with a sophisticated type system to catch errors at compile time. But, again it can be difficult to understand.

Steep Learning Curve: It simply has a steep learning curve that requires time to learn. Time that is spent being less productive in actually delivering on work. This means it is more an academic language for people that have all the time in the world to learn a new language. If it takes so much time to learn than is it really worth it in the end. By the time you become competent at it there is likely a better programming language with a simpler approach to doing things. Complex languages are more difficult to test.

Compilation Time: It can be slow, very slow as a result of extensive checks to ensure memory safety and prevent data races, all happening under-the-hood. Should you trust it? This will lead to longer compilation times. Time that could be better spent like maybe getting a cup of coffee?

Verbosity: It is explicit that leads to more code. More code leads to more tests! This means more development time and larger codebases.

Ecosystem Maturity: Let's just say it is growing. This means fewer readily available libraries and tools for certain tasks. And, more than likely tons of undiscovered and unresolved bugs in the backlog.

Cognitive Overhead: Developers have to think more explicitly about memory management, even when it doesn't require manual memory allocation. This means a lot of cognitive overhead making the whole development process more challenging. You are surrounded by complexity. Defeating the whole premise of "Keep it simple, stupid". And, the often quoted in complexity circles: "Complexity is the root of all evil".

Not Suitable for All Tasks: This language is still very much domain-specific. Tasks that it can be good for are systems programming, embedded systems, and other performance-sensitive applications. Especially, if you are akin to making things more complex than they need to be. In most workplaces, agility matters in getting things done, where this programming language will not be useful for majority of development tasks.

Error Handling: Very verbose error handling that requires more code.

String Handling: Way too many string types.

IDE Support: Let's just say it is improving and not as feature-rich.

Debugging: Imagine a language that focuses on memory safety but is a challenge to debug. Most things in this language just go against the grain of being productive and focus on academic rituals of memory safety. It will make you pull your hair out of frustration.

State of Rust Survey 2024

11 February 2025

Netflix Maestro

The bad side of Neo4J

Bad Horizontal Scaling: Distributing data and queries across cluster shards is complex and not fully supported, less mature, and less easier-to-manage in distributed architectures, problems for very large datasets and high throughput workloads

Memory Limitations: Support is mainly for in-memory where majority portion of graph data is in RAM, for large graphs that exceed the available memory the performance degrades

Query Performance and Tuning: Optimizing queries is challenging and requires understanding the entire query plan and indexes which is counter-intuitive, why not than just use a relational database like postgres?

Commerical Licensing Costs: Expensive for large deployments and advanced features

Community Edition Limitations: Limited features, scalability, and support

Limited Sharding Capabilities: Sharding is not fully supported, setup and management can be problematic and complex

Focus on Property Graphs: Does not support any other type of graph schemas and paradigms like RDF

Full-Text Search Limitations: Lacks advanced and dedicated search capabilities

Backup and Recovery: Limited and complex backup and recovery especially for clustered environments and very large datasets, problematic for point-in-time recovery or restoring from a distributed backup

Monitoring and Management: Requires specialized tools and can be complex

Vendor Lock-in: Cypher is tightly coupled to Neo4J which may lead to vendor lock-in

Data Import/Export: Import/Export of very large datasets is problematic and time-consuming

Integration: In many cases custom development with other systems may be required

Driver Maturity and Consistency: Maturity of language drivers and feature parity can vary which may lead to inconsistencies and limitations

Limited Support for Some Languages: Less common languages may be less mature which may lead to maintenance and feature lag

Cypher Quirks: Frustrating quirks and edge cases for developers that may lead to unexpected behavior, requires understanding the query plan and execution

Stored Procedures: These can add complexity in development process

Schema Evolution: Evolving data model like new properties and relationships can be problematic especially in data migration

Data Validation: Ensuring data query and consistency requires careful planning and implementation of validation logic at application level

Integration with other Graph Systems: Differences in data models and query languages can be problematic

Deployment Complexity: Setting up and management of a clustered Neo4J deployment can be complex and require careful configuration

Security Hardening: Requires careful configuration and maintenance especially against specific settings and potential vulnerabilities

Tooling: Less mature for monitoring, profiling, and management

Resource Consumption: Very resource-intensive especially for large graphs and complex queries requires capacity planning and resource management

Reasoning: Being mainly a property graph database it lacks inference and reasoning ability, additional RDF support can be achieved via tools like neosemantics but they also lack reasoning functionality, difficult to optimize for SPARQL queries, significant custom development is required for semantic and linked data

Generative AI: Terribly slow for generative AI, integration with LLMs, poor query performances for specific query tasks in GraphRAG, best to use alternatives that can handle large datasets and more flexible queries, requires careful consideration of chunking strategy on branches

24 January 2025

Submarine

12 January 2025

Big Little Guide To Message Queues

17 December 2024

Java sucks for startups

Java is no doubt a powerful and robust programming language with a huge ecosystem. But, it is not ideal for startups. In fact, it may not even be all that good for larger projects either.

Can be a continuous and steep learning curve especially hindered by the 6 monthly release cycles
Verbose syntax
Often a pain with longer development cycles
Heavier runtime environment
Again those 6 monthly releases are frustrating and annoying
Higher resource consumption
Huge outlay for cloud development
Less flexible ecosystem
Initial setup and configuration time
Build and deployment complexity
Potential for major performance bottlenecks
Licensing costs
Time and effort in manhours
Lots of unnecessary boilerplate code
Compilation times
Strict typing can be a headache
Slow development cycles
Lots of third-party libraries that can lead to security vulnerabilities with larger attack surfaces
Auditing complexity due to larger codebases
Leads to dependency hell from transitive dependencies, to version conflicts, to outdated dependencies that can cause build failures and runtime errors

31 October 2024

Java vs Go vs C vs C++

The below highlight the key areas where the various programming languages are used and their summarized characteristic differences.

Java

Syntax: More verbose, object-oriented
Concurrency: Thread-based
Memory Management: Garbage Collection
Ecosystem: Mature, extensive libraries and frameworks
Performance: Generally slower startup time, but good performance at runtime
Learning Curve: Steeper learning curve
Use Cases: Enterprise Application, Android Development, Big Data and Data Science

Syntax: Concise, more procedural
Concurrency: Goroutines and channels
Memory Management: Garbage collection
Ecosystem: Growing, but less mature than Java
Performance: Faster compilation and runtime
Learning Curve: Easier to learn
Use Cases: Microservices Architecture, Cloud-Native Applications, Network Programming and Systems Programming, High-Performance Applications

Syntax: Low-level, procedural
Concurrency: Threads
Memory Management: Manual
Ecosystem: Smaller ecosystem, but focused on system-level programming
Performance: High performance, low-level control
Learning Curve: Steep learning curve
Use Cases: Systems Programming, Embedded Systems, Operating Systems

C++

Syntax: Complex, object-oriented
Concurrency: Threads
Memory Management: Manual
Ecosystem: Large, complex ecosystem
Performance: High performance, fine-grained control
Learning Curve: Steep learning curve
Use Cases: High-Performance Applications, Game Development, Scientific Computing