• Snowflake, Fabric and Power BI Integration Options
    Figure 2.0 – Snowflake and Fabric / Power BI integration options

    Back in September I published an article reviewing options for sharing lakehouse format data amongst Fabric, Snowflake, and Databricks. The article can be found by clicking here. This article expands on connectivity options between Snowflake and Fabric / Power BI, covering both traditional SQL endpoint connectivity and new lakehouse integration options. Interoperability between Snowflake and Fabric / Power BI is a hot topic these days, Vice Presidents from both companies even co-authored an article on the topic at this link.

    Per Figure 2.0 above, each of the seven options above will be explained along with potential use case pros and cons. After taking a deep dive into these options while preparing for this article, my opinion is that all of them will have scenarios for which they make sense albeit some should be reserved for niche architectural use cases. Depending on project personas (IT vs Business), data latency requirements, report & AI agent rendering time expectations, cost management priorities, and cloud egress costs there are many factors influencing the connectivity choices for your projects.

    Please note that the views expressed in this article are my own and do not necessarily reflect the views of my employer. I am not comparing Fabric / Power BI versus Snowflake, but rather discussing these options as ways to make them work better together. Also, I plan to write a similar article for Fabric / Power BI integration options with Databricks in the near future.

    For each of the seven options for Fabric / Power BI connectivity and integration in Figure 2.0 above, I have added a section below with a diagram and some examples of how the option might benefit a project:

    1. Fabric / Power BI read Snowflake SQL endpoint

    Figure 2.1 – Fabric / Power BI query the Snowflake SQL endpoint

    Power BI reading the Snowflake SQL endpoint has been generally available and supported for many years. Based on what I see in my daily work, most solutions built with Power BI and Snowflake use this method of connectivity. Fabric tools and Power BI Semantic Models can query Snowflake in a similar manner as SQL Server, Oracle, Teradata, Redshift, etc. I further break this connectivity method into three modes: Import Mode, Direct Query Mode, and Composite Mode. Import mode is also similar to the queries from Fabric tools including Dataflows, Dataflows Gen 2, and Pipelines querying the Snowflake SQL endpoint in iterative batches.

    1.1 Import Mode

    Power BI Semantic Model Import Mode queries the Snowflake SQL endpoint on a schedule or if triggered by an API. The results of the query are cached in a compressed columnar database of a Fabric Semantic Model that is optimized for a reduced data storage footprint, high query complexity, and high query concurrency. When architected correctly, I’ve seen Import Mode Semantic Models perform well with hundreds of millions and sometimes billions of rows in a table.

    Fabric tools such as Dataflows, Dataflows Gen 2, and Pipelines also query the Snowflake SQL endpoint in a similar manner, so I have included them in this mode.

    When is an Import Mode Semantic Model a good choice?

    Snowflake Administrators can serve up data to users via a SQL endpoint. There’s no need to grant end users access to build anything in Snowflake, and Snowflake doesn’t need to do anything in Fabric / Power BI. Fabric / Power BI users can still do transformations with Power Query or Dataflows / Dataflows Gen 2. Using Fabric Pipelines to ingest tables, views, or custom queries from Snowflake is an option for SQL query based ingestion into Fabric.

    What are the risks of an Import Mode Semantic Model?

    Data latency can be an issue if data from Snowflake needs to be less than thirty minutes old. Large or complex queries can also take longer to run on Snowflake and can increase costs in pay-per-query scenarios with Snowflake. If many semantic models are hitting the same Snowflake tables, duplicate queries could compound the problem and increase total costs. Also, Import Mode Semantic Models can reach performance limits with a combination of extreme query complexity and extremely large volumes of data.

    1.2 Direct Query Mode

    Direct Query Mode will not create a cached columnar database in the Fabric Semantic Model. The Fabric Semantic Model will convert queries to SQL and send them directly to Snowflake for processing.

    When is a Direct Query Mode Semantic Model a good choice?

    Direct Query Mode to Snowflake should be a niche option reserved for either 1) use cases requiring extreme low latency for data freshness, or 2) extremely large tables of data (too large for Import Mode) with simple query logic.

    What are the risks of a Direct Query Mode Semantic Model?

    Every single web part on a report, or every query from an AI data agent, will send a SQL query back to Snowflake. If there are 15 widgets on the screen, clicking a filter value will send 15 queries. If 10 people are using the report concurrently, there will be (15 widgets x 10 people) 150 potentially complex queries running concurrently on Snowflake. Costs will usually be higher on the Snowflake side of the architecture, and reports may not render as quickly as Import Mode.

    1.3 Composite Mode

    Composite Mode is an option to use both Import Mode and Direct Query Mode in the same Fabric / Power BI Semantic Model. Composite Mode Semantic Models offer the best of both worlds, and can also be ideal use cases for advanced capabilities such as Aggregations to improve query performance on large and complex data models. I wrote an article about Composite Mode Semantic Models and Aggregations a few years ago at this link. Composite Mode will also support Direct Lake Mode Semantic Models as sources, which are covered as part of options 2-4 below.

    When is a Composite Mode Semantic Model a good choice?

    Composite Mode Semantic Models are good options when a complex data model is neither a good fit for Import or Direct Query Mode alone. Extremely large fact tables mixed with smaller fact tables and dimensions are usually a good fit for Composite Mode.

    What are the risks of a Composite Mode Semantic Model?

    Individual tables within a Composite Mode Semantic Model have the same inherent risks as Import and Direct Query Mode. Also, the complexity of the design requires strong knowledge of data modeling best practices.

    2. Fabric Mirroring of Snowflake

    Figure 2.2 – Fabric OneLake can mirror tables from Snowflake using Change Data Capture (CDC)

    With a large Power BI environment or numerous AI Data Agents, queries between Snowflake and Fabric / Power BI can be complex with extremely high concurrency. Even for Import Mode Semantic Models using the Snowflake SQL endpoint, numerous semantic models refreshing frequently can result in high compute costs and duplicate data moving across networks. Fabric OneLake has a capability called Mirroring which can be used to optimize costs and query performance when connecting to Snowflake. Mirroring will connect to a Snowflake table change data capture (CDC) mechanism to pull updates into OneLake as incrementally updated delta parquet tables. Fabric / Power BI tools and reports can then query the copy of the table which has been moved over the network from Snowflake once, opposed to each query crossing the network separately. The Fabric compute and storage costs for Mirroring are also free up to limits described at this link.

    When is Mirroring a good choice?
    • Numerous Fabric / Power BI items are querying the same table in Snowflake
    • The Snowflake table supports CDC (is not a view)
    • The Snowflake table is updated periodically with small changes or in small batches (not a Type 1 table with full refreshes)
    • Cost savings for both Snowflake and Fabric compute are a goal
    • Direct Lake Semantic Models are used for Power BI or AI Fabric Data Agents
    What are the risks of Mirroring?
    • Snowflake views and custom queries are not supported, only tables.
    • Type 1 (full refresh) Snowflake tables, or tables that have bulk daily updates, are not good candidates for Mirroring
    • If Mirroring breaks or needs to be reset, there may be a delay as the historical data re-populates
    • Security must be reconfigured on the new iceberg tables in Fabric / Power BI

    3. Snowflake write Iceberg to Fabric

    Figure 2.3 – Snowflake writes Iceberg tables to Fabric OneLake

    Tools in Fabric / Power BI are optimized to run on delta parquet or Iceberg tables in OneLake. Snowflake natively supports Iceberg format and can write Iceberg tables to external storage. Writing those Iceberg tables to OneLake enables Fabric / Power BI tools to query the Iceberg tables directly without crossing between the platforms every time a query runs. The Snowflake team can write the table to Fabric, keep it updated, and then all the compute and traffic from Fabric / Power BI can be contained withing Fabric.

    When is Snowflake writing Iceberg to Fabric a good choice?
    • Numerous Fabric / Power BI items are querying the same table in Snowflake
    • Snowflake data can be written from a table, a view, or a custom query
    • The Snowflake environment can be configured to update the table in Fabric as needed. Less frequent updates will reduce total compute usage
    • Data is sent from Snowflake to Fabric by the Snowflake team, so end users do not need permission to query Snowflake
    • Cost savings for both Snowflake and Fabric compute are a goal
    • Direct Lake Semantic Models are used for Power BI or AI Fabric Data Agents
    What are the risks of Snowflake writing Iceberg to Fabric?
    • Refreshes are triggered from Snowflake, not by end users in Fabric / Power BI
    • Security must be reconfigured on the new iceberg tables in Fabric / Power BI

    4. Fabric Shortcut to Snowflake Iceberg

    Figure 2.4 – Fabric OneLake can shortcut to Snowflake iceberg tables

    In addition to Snowflake writing Iceberg tables to Fabric OneLake, Fabric OneLake can also shortcut to Iceberg tables in the Snowflake environment. This scenario is similar to Snowflake writing Iceberg to Fabric (previous option), but the storage will be outside of Fabric and accessed by OneLake as needed.

    When is Fabric Shortcut to Snowflake Iceberg a good choice?
    • Numerous Fabric / Power BI items are querying the same table in Snowflake
    • Snowflake data can be written from a table, a view, or a custom query
    • The Snowflake environment can be configured to update the Iceberg table as needed. Less frequent updates will reduce total compute usage
    • Only the Fabric team setting up shortcuts will need access to Snowflake
    • Cost savings for both Snowflake and Fabric compute are a goal
    • Direct Lake Semantic Models are used for Power BI or AI Fabric Data Agents
    What are the risks of Fabric Shortcut to Snowflake Iceberg?
    • Refreshes to the Iceberg tables are configured in Snowflake, not by end users in Fabric / Power BI
    • Security must be reconfigured on the new iceberg tables in Fabric / Power BI
    • Fabric OneLake shortcuts will still pull the data across the network from Snowflake when the data is needed for a query or Direct Lake Semantic Model.

    5. Snowflake Read Fabric SQL endpoint

    Figure 2.5 – Snowflake can read the Fabric SQL endpoint

    What if the Snowflake team needs a secure and supported means to query data from Fabric OneLake into Snowflake? For example business users may port data into OneLake from Excel, flat files, or Power Platform tools. Snowflake can query the Fabric SQL endpoint to pull that data into Snowflake using SQL queries. The Fabric SQL endpoint should be a niche option for Snowflake users, and the next option on this list will likely be a better choice (6. Snowflake read table from Fabric as Iceberg).

    When is Snowflake reading the Fabric SQL endpoint a good choice?
    • Business users have uploaded curated data to Fabric, and the data is needed for projects in Snowflake
    • Projects need a secure platform for business users to upload data (Fabric OneLake) as a source for Snowflake
    • Data from sources that natively integrate with Fabric OneLake are needed for Snowflake projects
    What are the risks of Snowflake reading the Fabric SQL endpoint?
    • Connecting to OneLake and reading tables as Iceberg is probably a better option for most use cases (next option below)
    • Authentication to Fabric with this option requires using managed identity

    6. Snowflake read table from Fabric as Iceberg

    Figure 2.6 – Snowflake can read tables form Fabric OneLake as Iceberg

    Just as Fabric can shortcut to Snowflake and read Iceberg tables, Snowflake can also connect to Fabric OneLake and read tables as Iceberg. In most scenarios, Snowflake reading tables from Fabric OneLake as Iceberg should be the best method to get data from Fabric into Snowflake from the perspective of cost and scalability.

    When is Snowflake read table from Fabric as Iceberg a good choice?
    • Business users have uploaded curated data to Fabric, and the data is needed for projects in Snowflake
    • Projects need a secure platform for business users to upload data (Fabric OneLake) as a source for Snowflake
    • Data from other sources that natively integrate with Fabric OneLake are needed for Snowflake projects
    • Snowflake team will use Snowflake compute to work with data from Fabric
    What are the risks of Snowflake read table from Fabric as Iceberg?

    Compared to other options for getting Fabric data into Snowflake, I cannot find any relative risks.

    7. Snowflake ingest Fabric Real-Time data as Iceberg

    Figure 2.7 – Fabric Real-Time data can be ingested into Snowflake for cold path storage and analytics

    Fabric Real-Time Intelligence is an area of great potential growth for analytics and AI. Data can be pushed into Fabric (rather than pulled) and used for alerting and agentic operations. Real-Time Intelligence is often referred to as the hot path of a lambda architecture. For the cold path of a lambda architecture, the streaming data is stored for deep analytics and machine learning purposes. Many data science and analytics teams use Snowflake for storing historical data. Snowflake can ingest Fabric Eventhouse (Real-Time) data via OneLake as Iceberg for cold path storage.

    When is Snowflake ingest Fabric Real-Time data as Iceberg a good choice?
    • Fabric Real-Time Intelligence tools are used for lambda hot path AI, alerting and analytics but cold path data is stored and used in Snowflake
    • Snowflake teams need access to data that is easily streamed into Fabric from Fabric-friendly sources
    What are the risks of Snowflake ingest Fabric Real-Time data as Iceberg?

    Compared to other options for getting Fabric Real-Time data into Snowflake, I cannot find any relative risks.

    Summary of Options

    OptionReference url
    Fabric read Snowflake SQL endpointhttps://learn.microsoft.com/en-us/fabric/data-factory/connector-snowflake-overview
    https://learn.microsoft.com/en-us/power-bi/connect-data/service-connect-snowflake
    Fabric mirroring of Snowflake DB (Copies Metadata & Data)https://learn.microsoft.com/en-us/fabric/mirroring/snowflake
    Snowflake write Iceberg to Fabrichttps://docs.snowflake.com/en/sql-reference/sql/create-external-volume#label-create-external-volume-onelake
    Fabric shortcut to Snowflake Iceberghttps://learn.microsoft.com/en-us/fabric/onelake/onelake-iceberg-tables
    Snowflake read Fabric SQL endpointhttps://learn.microsoft.com/en-us/fabric/data-warehouse/query-warehouse
    Snowflake read table from Fabric as Iceberghttps://blog.fabric.microsoft.com/en-us/blog/new-in-onelake-access-your-delta-lake-tables-as-iceberg-automatically?ft=All
    Snowflake ingest Fabric Real-Time data as Iceberg Query Fabric OneLake Delta tables from Snowflake – Microsoft Fabric | Microsoft Learn

    Closing Thoughts

    All of these seven options for connectivity between Snowflake and Fabric will likely have use cases in the real world. In my opinion, the following options will be the most popular due to performance, cost, and administrative considerations:

    • 3. Snowflake write Iceberg to Fabric – High query complexity and user concurrency are the primary drivers of unnecessary costs when using Power BI with Snowflake. This option will allow Snowflake teams to write data (tables, views, custom queries) to OneLake on a schedule or when triggered, and then hundreds of queries can hit OneLake for AI and reporting use cases. In theory, this method should minimize cross-platform traffic and reduce total costs while offering fast query response times to end users.
    • 1.3 Composite Mode Semantic Models – Leveraging both Import Mode and Direct Lake Mode for the best of both worlds, can be used in combination with #3 and #2 above.
    • 6. Snowflake read table from Fabric as Iceberg – When Snowflake users need data from OneLake, this option should offer the best cost and performance.
    • 7. Snowflake ingest Fabric Real-Time data as Iceberg – Similar to #6, but in the context of pulling data from the Fabric EventHouse for cold path storage and analytics in Snowflake.

    Feedback and suggestions are welcome via my LinkedIn page or Twitter.

  • Video – Fabric Semantic Models add Good Math to AI Agents in Power BI, Azure AI Foundry, and Copilot Studio

    We are currently experiencing a generational surge in AI use cases and transformation. But will AI be able to connect directly to your raw data, perform all the necessary transformations, apply business-friendly names to fields, and add accurate logic to solutions? Does data engineering and architecture still matter?

    The video embedded at the bottom of this article is a recorded version of the presentation that I gave at SQL Saturday 2025 Minnesota. In February I will be presenting an updated version of this presentation at the Power BI & Fabric Summit which is organized by Radacad. You can register for the upcoming Summit at this link: https://fabricsummit.com/product/PowerBI-Fabric-Summit-2026 . I expect these tools to evolve quite a bit over the next few months!

    A well-known limitation of AI—specifically large language models (LLMs)—is that they are not fundamentally designed to perform accurate math. While newer LLMs can handle some mathematical tasks, query speeds are often slow, and the compute costs can be high. Translating the specialized context of natural language questions into precise logic also presents challenges. For example, if a business user asks, “Show me total sales for the year,” what exactly does “year” mean? Is it a calendar year, a fiscal year, or year-to-date?

    Now imagine how much more complex the math becomes with a question like, “Show average sales for blue and red widgets for customers in the East, excluding store holidays.” Traditional best practices known by data professionals for decades provide a solid foundation for accurate math in AI-driven applications. These practices will continue to evolve as we design data architectures optimized for AI. Microsoft Fabric semantic models are a powerful tool for building that logic in a way that provides both context for accurate calculations and fast, efficient query performance.

    If you’re a data professional with skills in dimensional modeling, query optimization, ETL/ELT, RLS/OLS—your expertise is now more crucial than ever for AI solutions that require “good math.” This video explores strategic reasons for using semantic models as the foundation for AI when querying structured data. It walks through a use case that begins with 275 million rows of raw data, demonstrates how to model the data for AI, leverages tools in Fabric semantic models to prepare the data, and then serves it to AI tools and agents using Fabric Data Agents, Power BI Copilot, Azure AI Foundry, and Microsoft 365 Copilot. GitHub repo to (no code) deploy the demo in the video to Fabric with 275M rows of real data: https://github.com/isinghrana/fabric-samples-healthcare/tree/main/analytics-bi-directlake-starschema

    Here are a few helpful links:

    Official Microsoft Blog article on this topic: Power BI semantic models as accelerators for AI-enabled consumption | Microsoft Power BI Blog | Microsoft Power BI

    Related article on this topic from a Partner: AI in Power BI: Time to pay attention – SQLBI

  • SQL Saturday – Minnesota 2025 Presentation

    For many years, SQL Saturday conferences around the world have provided fantastic educational opportunities for professionals specializing in Microsoft tools such as Fabric, Power BI, and SQL Server. SQL Saturday Minnesota is coming up on September 27, 2025 and I was fortunately selected to be a presenter. I’ll be presenting on best practices for Fabric and Power BI Semantic Models when designing them for use with Copilots and Agentic AI tools such as AI Foundry and Copilot Studio.

    Hint: Best practices for AI with Semantic Models is going to be a hot topic in coming years and an area where dimensional modeling experts will be in high demand!

    My session is scheduled to precede the event raffle at the end of the day, and will be tailored for all audiences from beginners to advanced users. Many other Microsoft experts will also be presenting throughout the day:

    Title: Fabric Semantic Models are the Foundation for good math with AI

    Date: Saturday, 27 Sep 2025 3:15 pm – 4:15 pm CST (60 minutes)

    Location: St. Paul College, Saint Paul, Minnesota, United States

    Registration link: https://sqlsaturday.com/2025-09-27-sqlsaturday1124/

    Description:
    We are currently experiencing a generational surge in AI use cases and transformation. But will AI be able to connect directly to your raw data, perform all the necessary transformations, apply business-friendly names to fields, and add accurate logic to solutions? Does data engineering and architecture still matter?

    A well-known limitation of AI—specifically large language models (LLMs)—is that they are not fundamentally designed to perform accurate math. While newer LLMs can handle some mathematical tasks, query speeds are often slow, and the compute costs can be high. Translating the specialized context of natural language questions into precise logic also presents challenges. For example, if a business user asks, “Show me total sales for the year,” what exactly does “year” mean? Is it a calendar year, a fiscal year, or year-to-date? Now imagine how much more complex the math becomes with a question like, “Show average sales for blue and red widgets for customers in the East, excluding store holidays.”

    Traditional best practices known by data professionals for decades provide a solid foundation for accurate math in AI-driven applications. These practices will continue to evolve as we design data architectures optimized for AI. Microsoft Fabric semantic models are a powerful tool for building that logic in a way that provides both context for accurate calculations and fast, efficient query performance. If you’re a data professional with skills in dimensional modeling, query optimization, ETL/ELT, RLS/OLS—your expertise is now more crucial than ever for AI solutions that require “good math.”

    This presentation will explore strategic reasons for using semantic models as the foundation for AI when querying structured data. We’ll walk through a use case that begins with 275 million rows of raw data, demonstrates how to model the data for AI, leverages tools in Fabric semantic models to prepare the data, and then serves it to AI tools and agents using Fabric Data Agents, Power BI Copilot, Azure AI Foundry, and Microsoft 365 Copilot.

    Again, here is the registration link and it would be great to see you there!: https://sqlsaturday.com/2025-09-27-sqlsaturday1124/

  • Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta

    This post examines strategies to reduce data duplication and enhance performance by enabling shared access to data at rest across Snowflake, Databricks, and Microsoft Fabric. These platforms are widely adopted in modern data ecosystems and each offers lakehouse capabilities that decouple compute from storage. In environments where multiple platforms coexist, sharing storage while orchestrating compute workloads across different engines can significantly reduce data duplication. This approach not only streamlines data management but also drives cost efficiency and can improve query performance.

    This post is the first in a three part series focusing on interoperability amongst Snowflake, Databricks, and Microsoft Fabric. The following list will be updated with urls as the posts are published:

    1. Unify data sharing across Snowflake, Databricks, and Fabric for a lakehouse trifecta (this article)
    2. Snowflake and Microsoft Fabric integration connectivity options (coming soon)
    3. Databricks and Microsoft Fabric integration connectivity options (coming soon)

    While consolidating data solutions onto a single platform is often ideal, many large organizations operate across multiple cloud environments due to team autonomy, legacy investments, or strategic diversification. Consider a scenario where Team B initiates a project to analyze the impact of supply chain disruptions on product sales. They manage and fund their analytics workloads on Cloud Platform B, which houses the sales data. However, the curated supply chain data resides on Cloud Platform A, owned by a separate team. With lakehouse architectures that support cross-platform data sharing, Team B can query data from Cloud Platform A using compute resources exclusively on Cloud Platform B. This model avoids unnecessary data replication, isolates compute to the platform using the data, reduces storage costs, and enables efficient cross-cloud analytics without compromising performance:

    Figure 1.1 – Compatible modern lakehouse architectures can restrict compute to one platform when sharing data across platforms

    When using a modern lakehouse cross-platform architecture as in Figure 1.1 above, the following benefits are possible:

    1. Cost containment by platform – Data from Cloud Platform A can be used by Cloud Platform B without incurring compute costs in Cloud Platform A. In the real world, a team with in-demand data in Cloud Platform A can share large volumes of data with many other teams without incurring additional costs.
    2. Minimize data duplication – With a lakehouse architecture using data sharing amongst compatible platforms, the replication of identical data can usually be reduced overall.
    3. Performance benefits – Sharing data amongst lakehouse cloud platforms can potentially reduce latency by eliminating unnecessary data copy steps.
    4. Platform flexibility – Large organizations won’t need to standardize on a single platform. Value can be realized faster amongst existing teams with diverse platforms. Data can also be integrated faster after mergers and acquisitions. Vendor lock-in risks can be avoided.

    Before listing out the options for data sharing amongst Snowflake, Databricks, and Microsoft Fabric please note the following:

    • I limited the options in this article to data sharing using lakehouse architectures. SQL endpoints (all three platforms have them), third party connectivity options, and other compute-to-compute options were left off this list. Fabric mirroring of Snowflake was included because it creates a lakehouse table as a carbon copy mirror of a Snowflake table. Upcoming posts listed above will cover other options beyond lakehouse storage.
    • I focused on lakehouse integration where the files are interoperable versions of delta parquet or iceberg. Other file formats can move across platforms, but metadata compatibility is key to minimizing data duplication for analytic scenarios.
    • At the time of writing this article, some of the options are still in Preview. I’ll try to update these articles as the status changes.
    • I left off options that are not “out of the box” for the three cloud platforms. For example, some customers will write delta parquet files to Azure Data Lake using Fabric and then reuse the files with Databricks. Other customers have successfully used Apache Iceberg change data capture tooling to shift Snowflake Iceberg tables to Fabric.
    • There are important details about the connectivity options left out of this article for the sake of simplicity. For example options are impacted if Snowflake or Databricks are in another cloud other than Azure, or when private endpoint and private link capabilities are enabled on the platforms. If I covered every nuanced scenario, this article would become a book.
    • I consulted colleagues to confirm the accuracy of this list, but if anything is mis-stated or missing please let me know and I will make corrections.
    • These articles are not an attempt to compare or rank these three cloud platforms. At the time of writing this article I am a Microsoft employee, and all three products are fully supported on Microsoft Azure.

    The diagram below in Figure 1.2 may not initially be easy on the eyes, but if you follow each of the Lakehouse Connectivity Options one-at-a-time you can walk through the different ways to share lakehouse data amongst Snowflake, Databricks, and Microsoft Fabric:

    Storage integration options when using a lakehouse architecture with Snowflake, Databricks, and Fabric
    Figure 1.2 – Options for sharing lakehouse storage amongst Snowflake, Databricks, and Microsoft Fabric.

    Figure 1.3 below lists out each of the nine options above with details about availability status, potential use case scenarios, and details about where the data physically exists:

    Figure 1.3 – Details about the methods for sharing data amongst Snowflake, Databricks, and Microsoft Fabric

    For options 1-7 in Figure 1.3 above, the following list details examples of where the feature might add value, along with a url to learn more about the capability:

    FeatureReference url
    Fabric mirroring of Snowflake DB (Copies Metadata & Data)Microsoft Fabric Mirrored Databases From Snowflake – Microsoft Fabric | Microsoft Learn
    Snowflake write Iceberg to FabricCREATE EXTERNAL VOLUME | Snowflake Documentation
    Fabric shortcut to Snowflake IcebergUse Iceberg tables with OneLake – Microsoft Fabric | Microsoft Learn
    Fabric shortcut to Databricks unmanaged Delta ParquetUnify data sources with OneLake shortcuts – Microsoft Fabric | Microsoft Learn
    Fabric mirroring of Databricks Unity Catalog (just Metadata)Microsoft Fabric Mirrored Catalog From Azure Databricks – Microsoft Fabric | Microsoft Learn
    Snowflake read table from Fabric as IcebergNew in OneLake: Access your Delta Lake tables as Iceberg automatically (Preview) | Microsoft Fabric Blog | Microsoft Fabric
    Databricks read Fabric Delta Parquet via Managed IdentityIntegrate OneLake with Azure Databricks – Microsoft Fabric | Microsoft Learn
    Figure 1.4 – Reference links for the first seven options listed in Figure 1.2 and 1.3

    Notice that I left options 8-9 off of the url reference table in Figure 1.4, I was unable to locate official documentation pages from Databricks or Snowflake regarding those capabilities. If anyone has those links, let me know and I’ll add them to the table.

    Per the links at the beginning of this article, I will follow up to this post with two more posts about 1) Fabric/Snowflake and 2) Fabric/Databricks integration that include connectivity options beyond shared lakehouse storage.

    In summary, solutions built on Snowflake, Databricks, or Fabric can share data across the platforms using lakehouse architecture tools to minimize data duplication, reduce data latency, and optimize costs without consolidating on a single platform.

  • Install an end-to-end 275M row Microsoft Fabric solution with a single Notebook

    I have been collaborating with my colleague Inder Rana on a GitHub repository for an end-to-end Microsoft Fabric solution for some time, as it provides a simple means to learn about Fabric with real open source healthcare data. This most recent update enables non-technical people who have never used Fabric with a means to install a Fabric solution built to best practices in less than an hour for learning, testing, exploration.

    The new Quick Setup Github guide is available at this link: fabric-samples-healthcare/analytics-bi-directlake-starschema/quick-setup.md at main · isinghrana/fabric-samples-healthcare


    The most recent updates enable you to install the end-to-end solution using a single Fabric Notebook that can downloaded from the GitHub repo, uploaded to Fabric, and then run in a blank Workspace. The full process takes less than an hour. The brilliant automation to enable this new deployment option was entirely the work of Inder Rana. You can network with and follow Inder on LinkedIn and Medium:
    https://linkedin.com/in/singhinderjit

    https://isinghrana.medium.com/


    In less than one hour a single Fabric Notebook creates ELT Spark Notebooks, a Lakehouse with 275 rows of real public data, a Data Factory Pipeline, a Semantic Model, and a Power BI report. You can add a Fabric Data Agent from the following link at the GitHub site: fabric-samples-healthcare/analytics-bi-directlake-starschema/docs/5-CreateAISkill.md at main · isinghrana/fabric-samples-healthcare


    The dataset used is the publicly available Medicare Part D Prescribers – by Provider and Drug, sourced from the Centers for Medicare & Medicaid Services (CMS): https://data.cms.gov/provider-summary-by-type-of-service/medicare-part-d-prescribers/medicare-part-d-prescribers-by-provider-and-drug
    Below is a video in which Inder walks you through the process of deploying the new Quick Install:

  • I’ve been sharing insights on data technologies through blogs, videos, and podcasts for over 15 years. Over time, my content has been published across a variety of platforms, some of which have since become outdated or difficult to access. To make things simpler, I’ve created this new blog as a centralized hub where you can find all of my past work, along with new content moving forward. It’s designed to be a single, reliable source for everything I’ve created past, present, and future in the world of data.

    Why am I consolidating links to my old content at this site?

    • With advances in AI, a consolidated record of my contributions to public content could be useful as a source of reference for my LLM projects and testing.
    • A few of the blogs that I used in the past have been deleted or broken. I have been able to link to most (sadly not all) archived versions of those articles via the Way Back Machine internet archive. Many data diagram images and old OLAP MDX code snippets are forever lost.
    • All of my future content, even if it is published to other sites, will be discoverable on this new site.

    Where does the 80+ posts of historical content come from?

    Feedback and suggestions are always welcome!

  • Query Fabric Data Agents from Azure AI Foundry for Agentic Solutions

    One of the early criticisms of Large Language Models (LLMs) was an inability to do good math when answering questions. Newer LLMs can do great reasoning and mathematical tasks, but at a high compute cost and the query times can be slow. Microsoft Fabric offers easy-to-use SaaS data tools that can scale to very large volumes of data and run traditional relational database queries. While custom RAG models (Retrieval-Augmented Generation) have been able to map natural language queries to structured databases, most implementations have been high-code custom designs.

    Microsoft Fabric is a vast suite of tools that are easy to use, but using the right tools for the right purpose can require some training and knowledge. Along with my teammate Inder Rana, we created a GitHub repo that uses 250 million rows of real CMS open source Medicare Part D healthcare data, which can be found at this link: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · isinghrana/fabric-samples-healthcare . The entire GitHub repo can be installed in a few hours without requiring a coding background to deploy.

    In the video below, I show you how to connect an Azure AI Foundry Agent to a Microsoft Fabric Data Agent which can be created with the GitHub repo. At the conclusion of the build, you can start asking questions of the AI Foundry Agent and getting mathematically correct answers from 250M rows of data!

  • Are you tired of watching Fabric demos where you can’t use the tools hands-on and push them to the limits? Would you like to test out Microsoft Fabric with 250 million rows of real data? Maybe you need a solid demo or you have a real Healthcare use case for 10 years of #OpenData CMS Medicare Part D data? This session will walk you through the steps to easily deploy a solution using Fabric Notebooks, OneLake (Lakehouse), Warehouse, Pipelines, a Direct Lake Semantic Model, Power BI, and with everything optimized for use with the Fabric Copilots. All you need is a Fabric Workspace and a Power BI Pro license. You can even spin up an F64, deploy the solution from GitHub (link here: https://github.com/isinghrana/fabric-… ), and then pause the capacity while not in use. Inder Rana and Greg Beaumont will walk you through the process by which to install the solution, end-to-end, without writing any code. The entire 250M row medallion Lakehouse architecture will even be fully deployed during the session, end-to-end from the CMS servers to a Gold Lakehouse in Fabric. Register for monthly events such as this one at this link: https://aka.ms/HLSPowerBI

  • Interested in trying out the new Microsoft Fabric AI Skill? Fabric AI Skill is a new Generative AI capability in preview for Fabric that functions as the basis for a SaaS AI Agent in Fabric using natural language to query your data. An AI Skill module has been added to the GitHub repo created along with my colleague Inderjit Rana.

    Why would you want to use AI Skill? AI Skill uses a pattern similar to a RAG large language model (LLM) so that context about your data and examples of proper SQL queries will provide a basis for the LLM to generate accurate queries that then run against your data such as “What was total costs in 2022 for Dallas, Texas?” or “Show me the top cities in Florida based on total beneficiaries?” 

    If you’ve already deployed the 250M row Lakehouse / Warehouse / Direct Lake solution from the repo, the new AI Skill content can be deployed in about 5-10 minutes. If you haven’t deployed the rest of the repo, steps in the repo documentation will guide you through the deployment process which should take less than an hour and result in an optimized Lakehouse and/or Warehouse that is ready for Fabric AI Skills. All that you need to deploy the entire solution is a Fabric Workspace that uses a Fabric or Premium capacity with the Fabric tools enabled.

    I’m currently working on an in-depth article to review use cases for AI Skill, but for now you can get started by using the GitHub repo at this link: https://github.com/isinghrana/fabric-samples-healthcare/tree/main/analytics-bi-directlake-starschema  

    A video walking through a demo of the AI Skill module, along with a walkthrough of the deployment process can be found here: https://youtu.be/ftout8UX4lg 

  • If you come from a SQL background the Fabric Warehouse is made for you.

    This article will discuss Stored Procedures (SPROCs) in the Fabric Warehouse, how they can benefit customers with SQL skills and provides a video to demonstrate the speed and power of SPROCs in the Fabric Warehouse.

    Before Fabric Pipelines and Dataflows, before Azure Data Factory, and before SQL Server Integration Services (SSIS), there were Stored Procedures on SQL Server. For data professionals who are literate in SQL, Stored Procedures are a simple and reliable way to perform tasks such as creating and updating tables, running queries for data transformations, and more. 

    In the Healthcare industry segment, especially with Providers, both on-premises and cloud SQL Server deployments are still a popular and reliable platform for data and analytics. Many data professionals working in these environments write and think about the SQL language more frequently than they do spoken languages.

    A great deal of the press and media for Microsoft Fabric has focused on Spark Notebooks (deservedly so, Spark Notebooks are fantastic). While you can write SQL in Spark Notebooks, the most popular language for Notebooks is Python which is very different from the SQL Server world many of us grew up embracing. I’ve had some customers tell me that “Fabric looks promising, but our SQL experts will need to re-skill for Spark Notebooks. Also, our SQL Views and Stored Procedures will need to be rewritten.” The Fabric Warehouse, often overlooked by the press and media, is designed in the spirit of a SQL Server environment. For some use cases, it provides a great path for moving to Fabric without re-skilling and re-writing queries.

    Along with my colleague Inder Rana, we recently created a GitHub Repo for deploying 249 million rows of real CMS data to a Fabric environment for testing and demos. The end-to-end solution takes about 20 minutes to deploy using fully automated scripts in Spark Notebooks, and then has instructions to deploy the Power BI components. As an alternative to the Spark Notebooks, there is also an option to deploy the Gold layer of the medallion architecture to the Fabric Warehouse using Stored Procedures. You can try out the GitHub Repo from this link: fabric-samples-healthcare/analytics-bi-directlake-starschema at main · gregbeaumont/fabric-samples-healthcare  

    Initially, we offered the Stored Procedures option as an alternative to Spark Notebooks for the purposes of 1) demonstrating different options in Fabric, 2) providing a path that SQL experts would appreciate, and 3) comparing semantic model query performance in the Warehouse verses the Lakehouse. When testing, the Stored Procedures often ran twice as fast as the Spark Notebooks! The results were at first surprising, and after validating that the Warehouse Stored Procedures are indeed very powerful, I was inspired to write this video and blog.

    If you come from a SQL background and you prefer to write SQL, or if you are assessing migration of analytic workloads to Fabric but you don’t want to convert your Stored Procedures and Views to Python, the Fabric Warehouse is a powerful option and worth the effort to evaluate. 

    Below is a link to the video that walks through deployment of Stored Procedures that transform a flattened table of data (249M rows) into a star schema with four dimensions and a fact table in the Fabric Warehouse that are optimized for Power BI query performance: