Featured Blog Posts – AnalyticBridge

Featured Blog Posts – AnalyticBridge


Featured Blog Posts – AnalyticBridge 2019-05-03T12:22:22Z
https://www.analyticbridge.datasciencecentral.com/profiles/blog/feed?promoted=1&amp%3Bxn_auth=no

The graph visualization landscape 2019 tag:www.analyticbridge.datasciencecentral.com,2019-04-09:2004291:BlogPost:392061
2019-04-09T10:00:00.000Z


Elise Devaux
https://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux

<p><span style=”font-size: 18pt;”><strong>Graph are meant to be seen</strong></span></p>
<p><span><br></br> The third layer of graph technology that we discuss in this article is the front-end layer, the graph visualization one. The visualization of information has been the support of many types of analysis, including </span><a href=”https://en.wikipedia.org/wiki/Social_network_analysis”><span>Social Network Analysis</span></a><span>. For decades, visual representations have helped researchers,…</span></p>


<p><span style=”font-size: 18pt;”><strong>Graph are meant to be seen</strong></span></p>
<p><span><br/> The third layer of graph technology that we discuss in this article is the front-end layer, the graph visualization one. The visualization of information has been the support of many types of analysis, including </span><a href=”https://en.wikipedia.org/wiki/Social_network_analysis”><span>Social Network Analysis</span></a><span>. For decades, visual representations have helped researchers, analysts and enterprises derive insights from their data.</span></p>
<p><span><strong><br/> Visualization tools represent an important bridge between graph data and analysts. It helps surface information and insights leading to the understanding of a situation, or the solving of a problem.</strong></span></p>
<p><span><br/> While it’s easy to read and comprehend non-graph data in a tabular format such as a spreadsheet, you will probably miss valuable information if you try to analyze connected data the same way. Representing connected data in tables is not intuitive and often hides the connections in which lies the value. Graph visualization tools turn connected data into graphical network representations that takes advantage of the human brain proficiency to recognize visual patterns and more pattern variations.</span></p>
<p><span><br/> In the field of graph theory and network science, researchers started to imagine graph analysis and visualization tools as early as 1996 with the </span><a href=”http://mrvar.fdv.uni-lj.si/pajek/history.htm”><span>Pajek</span></a><span> project. Even though these applications have long been confined to the field of research, it was the birth of computer tools for graph visualization.</span></p>
<div class=”wp-caption aligncenter”><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1826401034?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1826401034?profile=RESIZE_710x” class=”align-center”/></a></div>
<div id=”attachment_7174″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”><em>The pajek software, initiated in 1996</em></p>
</div>
<p><span style=”font-size: 18pt;”><strong>Visualization speeds up data analysis</strong></span></p>
<p><span><br/> There is a reason researchers started to develop these tools. As we previously wrote, </span><a href=”https://linkurio.us/blog/why-graph-visualization-matters/”><span>graph visualization is critical for the analysis of graph data</span></a><span>. When you apply visualization methods to data analysis, you are more likely to cut the time spent looking for information because:<br/> <br/></span></p>
<ul>
<li><span>You have a greater ability to recognize trends &amp; patterns.</span></li>
<li><span>You can digest larger amounts of data more easily.</span></li>
<li><span>You will compare situations or scenarios more easily.</span></li>
<li><span>And in addition, it will be easier to share and explain your findings through a visual medium.<br/> <br/></span></li>
</ul>
<p><span>Combined with the capabilities brought by computer machines, these advantages opened new doors for analysts seeking information in large volumes of data. It is also the reason graph visualization solutions are complementary to the <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-2-graph-analytics/”>graph analytics</a> and <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/”>graph databases tools</a> we discussed in the previous articles. Once data is stored and calculations are done, end-users need an intelligible way to process and make sense of the data. And graph visualization tools are useful in many scenarios.<br/> <br/></span></p>
<p><span>You need to </span><a href=”https://linkurio.us/blog/big-data-technology-fraud-investigations/”><span>identify shady financial schemes in terabytes of data</span></a><span>? Graph data visualization. You need to </span><a href=”https://linkurio.us/blog/critical-threats-project-delivers-timely-intelligence-linkurious/”><span>understand the human dynamic between criminal networks</span></a><span>? Graph data visualization. You need to quickly </span><a href=”https://linkurio.us/blog/bforbank-detects-fraud-with-linkurious/”><span>assess the fraudulence of flagged transactions</span></a><span>? You guessed it, graph visualization.<br/> <br/></span></p>
<p><span>Most of the tools we are about to present can be plugged directly to database and analytics systems to further the analysis of graph data.<br/> <br/></span></p>
<p><span style=”font-size: 18pt;”><strong>Graph visualization libraries and toolkits</strong></span></p>
<p>Among the common tools available today to visualize graph data are libraries and toolkit. These libraries allow you to build custom visualization application adjusted to your needs: from a basic graph layout displaying data in your browser, to an advanced application embedding a full panel of graph data customization and analysis features. They do require the knowledge of programming languages, or imply that you have development resources available.</p>
<p></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1826436756?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1826436756?profile=RESIZE_710x” class=”align-center”/></a></p>
<p class=”wp-caption-text” style=”text-align: center;”><em>The graph visualization libraries and toolkit ecosystem</em></p>
<p></p>
<p><span>The catalog is wide, with plenty of choices depending on your favorite language, license requirement, budget or project needs. In the open-source world, some libraries offer many possibilities for data visualization, including graph, or network, representations. It’s the case of </span><a href=”https://d3js.org/”><span>D3.js</span></a><span> and </span><a href=”http://visjs.org/”><span>Vis.js</span></a><span> for instance that let you choose among different data representation formats.<br/> <br/></span></p>
<p><span>Other libraries solely focus on graph representations of data, such as </span><a href=”http://js.cytoscape.org/”><span>Cytoscape.js</span></a><span> or </span><a href=”http://sigmajs.org/”><span>Sigma.js</span></a><span>. Usually, these libraries provide a more features than the generalist ones. There are libraries in Java such as </span><a href=”http://graphstream-project.org/”><span>GraphStream</span></a><span> or </span><a href=”http://jung.sourceforge.net/”><span>Jung</span></a><span>, or in Python, with packages like </span><a href=”https://www.nodebox.net/code/index.php/Graph”><span>NodeBox Graph</span></a><span>.<br/> <br/></span></p>
<p><span>You will also find commercial graph visualization libraries such as </span><a href=”https://www.yworks.com/”><span>yFiles from yWorks, </span></a><span> </span><a href=”https://cambridge-intelligence.com/keylines/”><span>Keylines</span></a><span> from Cambridge Intelligence,  </span><a href=”https://www.tomsawyer.com/perspectives/”><span>Tom Sawyer Perspectives</span></a><span> from Tom Sawyer Software, or our own solution </span><a href=”http://ogma.linkurio.us/”><span>Ogma</span></a><span>. The commercial libraries have the advantage of guaranteeing continuous technical support and advanced performances.<br/> <br/></span></p>
<p><span style=”font-size: 18pt;”><strong>Graph visualization software and web applications</strong></span></p>
<p><span><span style=”font-size: 14pt;”>Research applications <br/></span> <br/></span> <span>There are other solutions which do not require any development. These solutions are either, Saas, or on-premise software and web applications. As we mentioned earlier, the first off-the-shelf solutions spawn from the work of network theory researchers. After Pajek, other solutions were released, such as </span><a href=”http://www.netminer.com/product/overview.do”><span>NetMiner</span></a><span>in 2001, a commercial software for exploratory analysis and visualization of large networks data. In the same line, the </span><a href=”https://gephi.org/”><span>Gephi software</span></a><span>, created in 2008, brought a powerful open source tool to many researchers in the field of </span><a href=”https://en.wikipedia.org/wiki/Social_network_analysis”><span>Social Network Analysis</span></a><span>. Co-founded by Linkurious’ CEO, Sébastien Heymann, Gephi played a key role in democratizing graph visualization methods.<br/> <br/></span></p>
<p><span>Other research projects emerged, as web technologies simplified their creation. For instance, </span><a href=”http://hdlab.stanford.edu/palladio/about/”><span>Palladio</span></a><span>, a graph visualization web application for history researchers was created in 2013. More recently in 2016, the </span><a href=”https://osome.iuni.iu.edu/tools/networks/”><span>research project OSoMe</span></a><span> (the Observatory on Social Media) released an online graph visualization application to study the spread of information and misinformation on social media.<br/> <br/></span></p>
<p><span>However, graph visualization is no longer the preserve of the academic and research worlds. Others understood the potential of graph visualization and how such tools could help organizations and businesses in other fields: network management, financial crime investigation, cybersecurity, healthcare development, and more. Companies started to provide enterprise-ready graph visualization solution, as did </span><a href=”https://linkurio.us/blog/official-launch/”><span>Linkurious back in 2013</span></a><span>.<br/> <br/></span></p>
<p><span><span style=”font-size: 14pt;”>Generic and field-specific solutions</span><br/> <br/> Today you can easily find software or web applications to visualize graph data of various natures. </span><a href=”http://www.bakamap.com/”><span>Bakamap</span></a><span>is a web application to visualization your spreadsheet data as interactive graphs. The cloud-based application </span><a href=”https://begraph.net/”><span>BeGraph</span></a><span> offers a 3D data network visualizer. Historical open-source software such as </span><a href=”https://www.graphviz.org/”><span>GraphViz</span></a><span>and </span><a href=”https://cytoscape.org/what_is_cytoscape.html”><span>Cytoscape</span></a><span> also let you visualize any type of data as interactive graphs.<br/> <br/></span></p>
<p><span>Some companies propose solutions dedicated to certain use-cases. In these cases, the graph visualization application is often enhanced with features specially designed to answer needs specific to the given field. For instance, the </span><a href=”https://linkurio.us/product/”><span>Linkurious Enterprise graph visualization platform</span></a><span> is dedicated today to anti-fraud, anti-money laundering, intelligence analysis, and cybersecurity scenarios. So in addition to graph visualization, it proposes alerts and pattern detection capabilities to support the work of analysts in these fields. Another example of a field-specific tool is </span><a href=”https://vis.occrp.org/”><span>VIS</span></a><span> (Visual Investigative Scenarios), a tool designed by the OCCRP for journalists investigating major business or criminal networks. </span><a href=”https://seeyournetwork.com/”><span>Synapp</span></a><span>, on another hand, is an application dedicated to the visualization of human resources within organizations. As the adoption of graph technology spread, more and more areas are witnessing the development of specific graph visualization tools.<br/> <br/></span></p>
<p><span style=”font-size: 18pt;”><strong>Built-in visualizers and other add-ons</strong></span></p>
<p><span>Finally, the last set of tools dedicated to the visualization of graph data are the built-in visualizers and graph database add-ons.<br/> <br/></span></p>
<div id=”attachment_7176″ class=”wp-caption aligncenter”><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1826482689?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1826482689?profile=RESIZE_710x” class=”align-center”/></a><br/> <br/><p class=”wp-caption-text” style=”text-align: center;”><em>Built-in graph visualizers and add-ons</em></p>
</div>
<p><span>While graph visualization software and web applications are great for in-depth analysis or advanced graph data investigations, there are situations where you simply need a basic, accessible visualizer to get a glimpse of what a given graph dataset looks like. That is why some graph databases ship with built-in graph data visualizers. These features are a great asset for developers and data engineers working with graph data. Without leaving the graph database environment, you can easily access a graphical user interface to query and visualize your data. This is what the </span><a href=”https://neo4j.com/developer/guide-neo4j-browser/”><span>Neo4j browser</span></a><span> offers for instance, which can be of great help when creating datasets or running graph algorithms. Similarly, TigerGraph proposes a built-in graphical user interface: </span><a href=”https://www.tigergraph.com/category/graphstudio/”><span>GraphStudio</span></a><span> to visualize your database content. </span><a href=”https://bitnine.net/blog-agens-solution/blog-agensbrowser/announcing-agensbrowser-web-v-1-0-release/”><span>Last year, Bitnine released AgensBrowser,</span></a><span> a visualization interface to help you manage and monitor the content of your AgensGraph graph database.<br/> <br/></span></p>
<p><span>On a similar notice, graph database vendors have started to widen their offerings with add-on visualization tools compatible with their storage product. For example, at the beginning of last year, Neo4j launched </span><a href=”https://neo4j.com/bloom/”><span>Bloom</span></a><span>, an add-on application for the Neo4j desktop application. It offers a code-free visualization interface to explore data from Neo4j graph databases.<br/> <br/></span></p>
<p><span>We listed and presented in the following presentation a majority of visualization tools for graph data. </span><strong><a href=”https://linkurio.us/resource/graph-visualization-software/”>You can request the complete list of software and web-applications on Linkurious’ blog.<br/></a> <br/> <a href=”https://fr.slideshare.net/Linkurious/graphtech-ecosystem-part-2-graph-visualization-139711307″ target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1826520404?profile=RESIZE_710x” class=”align-center”/></a></strong></p>
<p style=”text-align: center;”><b>This post was initially published on <a href=”https://linkurio.us/blog/graphtech-ecosystem-part-3-graph-visualization/?utm_source=analyticbridge&amp;utm_medium=post&amp;utm_content=part3″ target=”_blank” rel=”noopener”>Linkurious blog.</a></b></p>
<p style=”text-align: center;”></p>
<p><span>Read the part 1: <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-2-graph-analytics/?utm_source=analyticbridge&amp;utm_medium=post&amp;utm_content=part3″ target=”_blank” rel=”noopener”>The graph database landscape 2019</a></span></p>
<p><span>Read the part 2: <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticbridge&amp;utm_medium=post&amp;utm_content=part3″ target=”_blank” rel=”noopener”>The graph analytics landscape 2019</a></span></p>



The importance of Alternative Data in Credit Risk Management tag:www.analyticbridge.datasciencecentral.com,2019-03-26:2004291:BlogPost:391642
2019-03-26T05:15:15.000Z


Naagesh Padmanaban
https://www.analyticbridge.datasciencecentral.com/profile/NaageshPadmanaban

<p><em>The emergence of alternative data as a key enabler in expanding credit delivery and financial inclusion is unmistakable.</em></p>
<p>The saying that the only thing that is constant is change, is attributed to Heraclitus, the Greek Philosopher. This is so very relevant today in the way lenders use technology and scoring solutions to understand the credit worthiness of applicants. Credit Risk Management has come a long way from the days when banks used just one credit score cut off to…</p>


<p><em>The emergence of alternative data as a key enabler in expanding credit delivery and financial inclusion is unmistakable.</em></p>
<p>The saying that the only thing that is constant is change, is attributed to Heraclitus, the Greek Philosopher. This is so very relevant today in the way lenders use technology and scoring solutions to understand the credit worthiness of applicants. Credit Risk Management has come a long way from the days when banks used just one credit score cut off to decision loan applications. Risk managers now have a plethora of solution options to enable them to craft the right risk reward balance when they design a credit policy that would suit them.</p>
<p>It is common knowledge that large volumes of data are being constantly generated and a good portion of this can be used to better understand a potential borrower. This profusion of data has only provided greater depth and reach to lenders.</p>
<p>The emergence of alternative data as a key enabler in expanding credit delivery and financial inclusion is unmistakable. It not only expands the scorable population, but also deepens the understanding of their payment behavior. The three credit bureaus, realizing the value of this data asset have embarked on an acquisition spree.</p>
<p>A basic definition of traditional data as well as alternative data will help understand the scenario better.</p>
<p><strong>Traditional Data</strong></p>
<p>Traditional data typically refers to data that credit bureaus maintain on their files. This includes data provided by the customer in the loan applications, data on credit lines, loan repayment history, credit enquiries as well as public information like bankruptcies. Traditional data is FCRA compliant and the acid test is that it must be verifiable and disputable by the customer.</p>
<p>Industry research has shown that scoring solutions that use traditional data cannot score a significant section of the population. According to the Consumer Financial Protection Bureau (CFPB), these ‘credit invisibles’ number over 45 million people. It further points out that although this segment of the population may not have a regular loan payment track record, they may still be paying their other bills regularly. It is thus very important to track this payment history – e.g. utility payments – to estimate their credit risk.</p>
<p><strong>Alternative Data</strong></p>
<p>Definitions of alternative data may vary, depending on where you choose to look them up. But in a broad sense it pertains to data that includes, but limited to rent payments, mobile phone payments, Cable TV payments as well as bank account information, such as deposits, withdrawals or transfers.</p>
<p>While alternative data has a very important role in financial inclusion, it also has other important benefits. In addition to improving the assessment of the risk of the customer, it can provide timely information to lenders on activities that may not be reflected on bureau data. Further it enables lenders to provide enhanced customer experience. For example, when they share online bank account, the loan application processing may be faster.</p>
<p>Like traditional data, alternative data to is susceptible to inaccuracies. Consumers may not be able to readily review and correct alternative data although the standards governing it are constantly changing and evolving to meet customer and regulatory expectations.</p>



Fascinating Developments in the Theory of Randomness tag:www.analyticbridge.datasciencecentral.com,2019-03-21:2004291:BlogPost:391631
2019-03-21T13:30:00.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>I present here some innovative results from my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available …</p>


<p>I present here some innovative results from my most recent research on stochastic processes. chaos modeling, and dynamical systems, with applications to Fintech, cryptography, number theory, and random number generators. While covering advanced topics, this article is accessible to professionals with limited knowledge in statistical or mathematical theory. It introduces new material not covered in my recent book (available <a href=”https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes” target=”_blank” rel=”noopener”>here</a>) on applied stochastic processes. You don’t need to read my book to understand this article, but the book is a nice complement and introduction to the concepts discussed here.</p>
<p>None of the material presented here is covered in standard textbooks on stochastic processes or dynamical systems. In particular, it has nothing to do with the classical logistic map or Brownian motions, though the systems investigated here exhibit very similar behaviors and are related to the classical models. This cross-disciplinary article is targeted to professionals with interests in statistics, probability, mathematics, machine learning, simulations, signal processing, operations research, computer science, pattern recognition, and physics. Because of its tutorial style, it should also appeal to beginners learning about Markov processes, time series, and data science techniques in general, offering fresh, off-the-beaten-path content not found anywhere else, contrasting with the material covered again and again in countless, identical books, websites, and classes catering to students and researchers alike. </p>
<p></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1529825331?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1529825331?profile=RESIZE_710x” class=”align-center”/></a></p>
<p>Some problems discussed here could be used by college professors in the classroom, or as original exam questions, while others are extremely challenging questions that could be the subject of a PhD thesis or even well beyond that level. This article constitutes (along with my book) a stepping stone in my endeavor to solve one of the biggest mysteries in the universe: are the digits of mathematical constants such as Pi, evenly distributed? To this day, no one knows if these digits even have a distribution to start with, let alone whether that distribution is uniform or not. Part of the discussion is about statistical properties of numeration systems in a non-integer base (such as the golden ratio base) and its applications. All systems investigated here, whether deterministic or not, are treated as stochastic processes, including the digits in question. They all exhibit strong chaos, albeit easily manageable due to their ergodicity.<span> </span></p>
<p>Interesting connections to the golden ratio, Fibonacci numbers, Pisano periods, special polynomials, Brownian motions, and other special mathematical constants, are discussed throughout the article. All the analyses were done in Excel. You can download my spreadsheets from this article; all the results are replicable. Also, numerous illustrations are provided. </p>
<p></p>
<p><a href=”https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness” target=”_blank” rel=”noopener”>Read the full article here</a>.</p>
<p><strong>Content of this article</strong></p>
<p>1. General framework, notations and terminology</p>
<ul>
<li>Finding the equilibrium distribution</li>
<li>Auto-correlation and spectral analysis</li>
<li>Ergodicity, convergence, and attractors</li>
<li>Space state, time state, and Markov chain approximations</li>
<li>Examples</li>
</ul>
<p>2. Case study</p>
<ul>
<li>First fundamental theorem</li>
<li>Second fundamental theorem</li>
<li>Convergence to equilibrium: illustration</li>
</ul>
<p>3. Applications</p>
<ul>
<li>Potential application domains</li>
<li>Example: the golden ratio process</li>
<li>Finding other useful b-processes</li>
</ul>
<p>4. Additional research topics</p>
<ul>
<li>Perfect stochastic processes</li>
<li>Characterization of equilibrium distributions (the attractors)</li>
<li>Probabilistic calculus and number theory, special integrals</li>
</ul>
<p>5. Appendix</p>
<ul>
<li>Computing the auto-correlation at equilibrium</li>
<li>Proof of the first fundamental theorem</li>
<li>How to find the exact equilibrium distribution</li>
</ul>
<p>6. Additional Resources</p>
<p><a href=”https://www.datasciencecentral.com/profiles/blogs/fascinating-new-results-in-the-theory-of-randomness” target=”_blank” rel=”noopener”>Read the full article here</a>.</p>



The graph analytics landscape 2019 tag:www.analyticbridge.datasciencecentral.com,2019-02-27:2004291:BlogPost:391235
2019-02-27T12:00:00.000Z


Elise Devaux
https://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux

<h1 style=”text-align: left;”><span style=”font-size: 12pt;”>Read the part 1 – <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&amp;utm_medium=article&amp;utm_content=07″ rel=”noopener” target=”_blank”>The graph database landscape</a></span></h1>
<h1 style=”text-align: center;”><strong>The graph analytics landscape 2019</strong></h1>
<p><span>Graph analytics frameworks consist of a set of tools and methods developed to extract…</span></p>


<h1 style=”text-align: left;”><span style=”font-size: 12pt;”>Read the part 1 – <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&amp;utm_medium=article&amp;utm_content=07″ target=”_blank” rel=”noopener”>The graph database landscape</a></span></h1>
<h1 style=”text-align: center;”><strong>The graph analytics landscape 2019</strong></h1>
<p><span>Graph analytics frameworks consist of a set of tools and methods developed to extract knowledge from data modeled as a graph. They are crucial for many applications because processing large datasets of complex connected data is computationally challenging. </span></p>
<p></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1213658813?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1213658813?profile=RESIZE_710x” class=”align-full”/><br/></a></span></p>
<h2><span style=”font-size: 18pt;”><strong>A need for analytics at scale</strong></span></h2>
<p><span>The field of graph theory has spawned multiple algorithms on which analysts can rely on to find insights hidden in graph data. From Google’s famous </span><a href=”https://en.wikipedia.org/wiki/PageRank”><span>PageRank algorithm</span></a><span> to traversal and path-finding algorithms or community detection algorithms, there are plenty of calculations available to get insights from graphs.</span></p>
<p><span>The graph database storage systems we mentioned in </span><a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&amp;utm_medium=article&amp;utm_content=07″ target=”_self”>the previous article</a><span> are good at storing data as graphs, or at managing operations such as data retrieval, writing real-time queries or at local analysis. But they might fall short on graph analytics processing at scale. That’s where graph analytics frameworks step in. Shipping with common graph algorithms, processing engines and, sometimes, query languages, they handle online analytical processing and persist the results back into databases.<br/> <br/></span></p>
<h2><span style=”font-size: 18pt;”><strong>Graph processing engines</strong></span></h2>
<p><span>The graph processing ecosystem offers various approaches to answer the challenges of graph analytics, and historical players occupy a large part of the market.</span></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1213658813?profile=original” target=”_blank” rel=”noopener”></a></span></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1213663551?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1213663551?profile=RESIZE_710x” class=”align-full”/></a></span></p>
<p><span>In 2010, Google led the way with the </span><a href=”https://dl.acm.org/citation.cfm?id=1807184″><span>release of Pregel</span></a><span>, a “large-scale graph processing” framework. Several solutions followed, such as </span><a href=”https://giraph.apache.org/”><span>Apache Giraph</span></a><span>, an open source graph processing system developed in 2012 by the Apache foundation. It leverages MapReduce implementation to process graphs and is the system used by Facebook to traverse its social graph. Other open source systems iterated on Google’s, for example, </span><a href=”https://thegraphsblog.wordpress.com/the-graph-blog/mizan/”><span>Mizan </span></a><span>or </span><a href=”http://infolab.stanford.edu/gps/”><span>GPS</span></a><span>.</span></p>
<p><span>Other systems, like </span><a href=”https://github.com/GraphChi”><span>GraphChi</span></a><span> or </span><a href=”http://www.powergraph.ru/en/soft/demo.asp”><span>PowerGraph Create</span></a><span>, were launched following GraphLab’s release in 2009. This system started as an open-source project at Carnegie Mellon University and is now known as </span><a href=”https://turi.com/”><span>Turi</span></a><span>.  </span></p>
<p><span>Oracle Lab developed </span><a href=”https://www.oracle.com/technetwork/oracle-labs/parallel-graph-analytix/overview/index.html”><span>PGX</span></a><span> (Parallel Graph AnalytiX), a graph analysis framework including an analytics processing engine powering Oracle Big Data Spatial and Graph.</span></p>
<p><span>The distributed open source graph engine Trinity, presented in 2013 by Microsoft, is now known as </span><a href=”https://www.graphengine.io/”><span>Microsoft Graph Engine</span></a><span>. </span><a href=”https://spark.apache.org/graphx/”><span>GraphX</span></a><span>, introduced in 2014, is the embedded graph processing framework built on top of </span><a href=”https://spark.apache.org/”><span>Apache Spark</span></a><span> for parallel computed. Some other systems have since been introduced, for example, </span><a href=”https://github.com/uzh/signal-collect”><span>Signal/Collect</span></a><span>.<br/> <br/></span></p>
<h2><span style=”font-size: 18pt;”><strong>Graph analytics libraries and toolkit</strong></span></h2>
<p><span>In the graph analytics landscape, there are also single-users systems dedicated to graph analytics. Graph analytics libraries and toolkit provide implementations of numbers of algorithms from graph theory.<br/> <br/></span></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1213665836?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1213665836?profile=RESIZE_710x” class=”align-full”/></a></span></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1213663551?profile=original” target=”_blank” rel=”noopener”></a></span></p>
<p></p>
<p><span>There are standalone libraries such as </span><a href=”https://networkx.github.io/”><span>NetworkX</span></a><span> and </span><a href=”https://networkit.github.io/”><span>NetworKit</span></a><span>, python libraries for large-scale graph analysis, or </span><a href=”https://igraph.org/redirect.html”><span>iGraph</span></a><span>, a graph library written in C and available as Python and R packages, and library provided by graph database vendors such as Neo4j with its </span><a href=”https://neo4j.com/graph-machine-learning-algorithms/”><span>Graph Algorithms Library</span></a><span>.</span></p>
<p><span>Other technology vendors offer graph analytics libraries for high performance graph analytics. It is the case of the GPU technology provider NVIDIA with its </span><a href=”https://developer.nvidia.com/nvgraph”><span>NVGraph library</span></a><span>. The geographic information software QGIS also built its own </span><a href=”https://docs.qgis.org/testing/en/docs/pyqgis_developer_cookbook/network_analysis.html#graph-analysis”><span>library for network analysis</span></a><span>.</span></p>
<p><span>Some of these libraries also propose graph visualization tools to help users build graph data exploration interfaces, but this is a topic for the third post of this series.<br/> <br/></span></p>
<h2><span style=”font-size: 18pt;”><strong>Graph query languages</strong></span></h2>
<p><span>Finally, one important piece of analytics frameworks that was not mentioned yet: graph query languages.</span></p>
<p><span>As for any storage system, query languages are an essential element for graph databases. These languages make it possible to model the data as a graph, and their logic is very close to the graph data model. In addition to the data modeling process, graph query languages are used to query data. Depending on their nature they can be used against databases systems and as domain-specific analytics language. Most of the high-level computing engines allow users to write using these query languages.</span></p>
<p></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1213668117?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1213668117?profile=RESIZE_710x” class=”align-full”/></a></span></p>
<p><a href=”https://neo4j.com/developer/cypher-query-language/”><span>Cypher</span></a><span> was created in 2011 by Neo4j to use on their own database. It has been </span><a href=”https://neo4j.com/blog/open-cypher-sql-for-graphs/”><span>open-sourced in 2015</span></a><span> as a separate project named </span><a href=”https://www.opencypher.org/”><span>OpenCypher</span></a><span>. Other notable graph query languages are: </span><a href=”https://tinkerpop.apache.org/gremlin.html”><span>Gremlin</span></a><span> the graph traversal language of Apache TinkerPop query language created in 2009 or </span><a href=”https://jena.apache.org/tutorials/sparql.html”><span>SPARQL</span></a><span>, the SQL-like language created by the W3C to query RDF graphs in 2008. More recently, TigerGraph developed its own graph query language name </span><a href=”https://www.tigergraph.com/2018/05/22/crossing-the-chasm-eight-prerequisites-for-a-graph-query-language/”><span>GSQL</span></a><span> and Oracle created </span><a href=”http://pgql-lang.org/”><span>PGQL</span></a><span>, both SQL-like graph query languages. </span><a href=”https://arxiv.org/abs/1712.01550″><span>G-Core</span></a><span> was proposed by the Linked Data Benchmark Council (LDBC) in 2018 as a language bridging the academic and industrial worlds. Other vendors such as OrientDB went for the </span><a href=”https://orientdb.com/docs/2.0/orientdb.wiki/Tutorial-SQL.html”><span>relational query language SQL</span></a><span>.</span></p>
<p><span>Last year, Neo4j launched an initiative to unify Cypher, PGQL and G-Core under a single standard graph query language: </span><a href=”https://gql.today/”><span>GQL (Graph Query Language)</span></a><span>. The initiative will be discussed during a </span><a href=”https://www.w3.org/Data/events/data-ws-2019/”><span>W3C workshop in March 2019</span></a><span>. Some other query languages are especially dedicated to graph analysis such as </span><a href=”https://github.com/socialite-lang/socialite”><span>SociaLite</span></a><span>.</span></p>
<p><span>While not originally a graph query language, Facebook’s </span><a href=”https://graphql.org/”><span>GraphQL</span></a><span> is worth mentioning. This API language has been extended by graph database vendors to use as a graph query language. </span><a href=”https://docs.dgraph.io/master/query-language/”><span>Dgraph uses it natively</span></a><span>as its query language, Prisma is planning to </span><a href=”https://www.prisma.io/features/databases”><span>extend it to various graph databases</span></a><span> and Neo4j has been pushing it into </span><a href=”https://grandstack.io/”><span>GRANDstack</span></a><span> and its query execution layer </span><a href=”https://github.com/neo4j-graphql/neo4j-graphql-js”><span>neo4j-graphql.js</span></a><span>.<br/> <br/></span></p>
<p>This article was originally posted on <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-2-graph-analytics/?utm_source=analyticsbridge&amp;utm_medium=article&amp;utm_content=07″ target=”_blank” rel=”noopener”>Linkurious blog</a>. It is part of a series of articles about the GraphTech ecosystem. This is the second part. It covers the graph analytics landscape. The first part introduced the <a href=”https://linkurio.us/blog/graphtech-ecosystem-2019-part-1-graph-databases/?utm_source=analyticsbridge&amp;utm_medium=article&amp;utm_content=07″ target=”_blank” rel=”noopener”>graph database vendors</a>.</p>



From Infinite Matrices to New Integration Formula tag:www.analyticbridge.datasciencecentral.com,2019-02-04:2004291:BlogPost:391083
2019-02-04T00:30:00.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. </p>
<p>For simplicity, I’ll start with some notations used in the context of matrix theory, familiar to everyone: T(<em>f</em>) = <em>g</em>, where <em>f</em> and <em>g</em> are vectors, and T a square matrix. The notation T(<em>f</em>) represents the product between the matrix T, and the vector <em>f</em>. Now, imagine that the…</p>


<p>This is another interesting problem, off-the-beaten-path. It ends up with a formula to compute the integral of a function, based on its derivatives solely. </p>
<p>For simplicity, I’ll start with some notations used in the context of matrix theory, familiar to everyone: T(<em>f</em>) = <em>g</em>, where <em>f</em> and <em>g</em> are vectors, and T a square matrix. The notation T(<em>f</em>) represents the product between the matrix T, and the vector <em>f</em>. Now, imagine that the dimensions are infinite, with <em>f</em> being a vector whose entries represent all the real numbers in some peculiar order. </p>
<p>In mathematical analysis, T is called an operator, mapping all real numbers (represented by the vector <em>f</em>) onto another infinite vector <em>g</em>. In other words, <em>f</em> and <em>g</em> can be viewed as real-valued functions, and T transforms the function <em>f</em> into a new function <em>g</em>.  A simple case is when T is the derivative operator, transforming any function <em>f</em> into its derivative <em>g</em> = d<i>f/</i>d<i>x</i>. We define the powers of T as T^0 = I (the identity operator, with I(<em>f</em>) = <em>f</em>), T^2(<em>f</em>) = T(T(<em>f</em>)), T^3(<em>f</em>) = T(T^2(<em>f</em>)) and so on, just like the powers of a square matrix. Now let the fun begins.</p>
<p><strong>Exponential of the Derivative Operator</strong></p>
<p>We assume here that T is the derivative operator. Using the same notation as above, we have the same formula as if T was a matrix:</p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/954724656?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/954724656?profile=RESIZE_710x” class=”align-center”/></a></p>
<p>Applied to a function <em>f</em>, we have:</p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/957090133?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/957090133?profile=RESIZE_710x” class=”align-center”/></a></p>
<p>This is a simple application of Taylor series. So the exponential of the derivative operator is a shift operator.</p>
<p><strong>Inverse of the Derivative Operator</strong></p>
<p>Likewise, as for matrices, we can define the inverse of T as</p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/954798699?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/954798699?profile=RESIZE_710x” class=”align-center”/></a></p>
<p>If T was a matrix, the condition for convergence is that <span>all of the eigenvalues of T – I have absolute value smaller than 1.</span> For the derivative operator T applied to a function <em>f</em>, and under some conditions that guarantee convergence, it is easy to show that</p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/954846726?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/954846726?profile=RESIZE_710x” class=”align-center”/></a></p>
<p>The coefficients (for instance 1, -4, 6, -4, 1 in the last term displayed above) are just the binomial coefficients, with alternating signs.</p>
<p>We call the inverse of the derivative operator, the <em>pseudo-integral</em> operator. It is easy to prove that the pseudo-integral operator (as defined above), applied to the exponential function, yields the exponential function itself. So the exponential function is a fixed point (the only continuous one) of the pseudo-integral operator. More interestingly, in this case, the pseudo-integral operator is just the standard integral operator: they are both the same. Is this always the case regardless of the function <em>f</em>?  It turns out that this is true for any function <em>f</em> that can be written as </p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/954958880?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/954958880?profile=RESIZE_710x” class=”align-center”/></a></p>
<p>This covers a large class of functions, especially since the coefficients can also be complex numbers. These functions usually have a Taylor series expansion too. However, it does not apply to functions such as polynomials, due to lack of convergence of the formula, in that case.</p>
<p>In short, we have found a formula to compute the integral of a function, based solely on the function itself and its successive derivatives. The same technique can be used to invert more complicated linear operators, such as Laplace transforms.</p>
<p><strong>Exercise</strong></p>
<p>Apply the derivative operator to the pseudo-integral of a function <em>f</em>, using the above formula for the pseudo-integral. The result should be equal to <em>f</em>. This is the case if <em>f</em> belongs to the same family of functions as described above. Can you identify functions not belonging to that family of functions, for which the theory is still valid? Hint: try <em>f</em>(<em>x</em>) = exp(<i>b</i> <em>x</em>^2) or <em>f</em>(<em>x</em>) = <em>x</em> exp(<em>b</em> <em>x</em>), where <i>b</i> is a parameter.</p>
<p><em>To not miss this type of content in the future,<span> </span><a href=”https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter”>subscribe</a><span> </span>to our newsletter. For related articles from the same author, <a href=”http://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles” target=”_blank” rel=”noopener”>click here</a><span> </span>or visit<span> </span><a href=”http://www.vincentgranville.com/” target=”_blank” rel=”noopener”>www.VincentGranville.com</a>. Follow me on<span> </span><a href=”https://www.linkedin.com/in/vincentg/” target=”_blank” rel=”noopener”>on LinkedIn</a>, or visit my old web page<span> </span><a href=”http://www.datashaping.com”>here</a>.</em></p>
<p><span style=”font-size: 14pt;”><b>DSC Resources</b></span></p>
<ul>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/new-books-and-resources-for-dsc-members”>Book and Resources for DSC Members</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources”>Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles”>Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a><span> </span>|<span> </span><a href=”http://www.analytictalent.com”>Find a Job</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/forum/topic/new”>Forum Questions</a></li>
</ul>
<p><span>Follow us: </span><a href=”https://twitter.com/DataScienceCtrl”>Twitter</a><span> | </span><a href=”https://www.facebook.com/DataScienceCentralCommunity/”>Facebook</a></p>



Graph Analytics to Reinforce Anti-fraud Programs tag:www.analyticbridge.datasciencecentral.com,2019-01-22:2004291:BlogPost:390515
2019-01-22T07:30:00.000Z


Elise Devaux
https://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux

<p></p>
<p><span>Organizations across industries are adopting graph analytics to reinforce their anti-fraud programs. In this post, we examine three types of fraud graph analytics can help investigators combat: insurance fraud, credit card fraud, VAT fraud.</span></p>
<h1><span>Detecting fraud is about connecting the dots</span></h1>
<p><span><br></br></span> <span>In many areas, fraud investigators have at their disposal large datasets in which clues are hidden. These clues are left behind by…</span></p>


<p></p>
<p><span>Organizations across industries are adopting graph analytics to reinforce their anti-fraud programs. In this post, we examine three types of fraud graph analytics can help investigators combat: insurance fraud, credit card fraud, VAT fraud.</span></p>
<h1><span>Detecting fraud is about connecting the dots</span></h1>
<p><span><br/></span> <span>In many areas, fraud investigators have at their disposal large datasets in which clues are hidden. These clues are left behind by criminals who, on their side, try to hide their activity behind layers of more or less intricate schemes. To unveil illegal activities, investigators have to connect the pieces of the puzzle to discover evidence of wrongdoing.</span></p>
<p><span>Most anti-fraud applications are able to connect simple data points together to detect suspicious behaviors: an IP address to a user, withdrawal activities to a place of residence, or a loan request history to a client.</span></p>
<p><span>But these applications fall short on more complex analysis that would imply several levels of relationships or data types. This is mostly due to the technology on which these applications often rely and the data silos it creates. The relational databases that emerged in the ’80s are efficient at storing and analyzing tabular data but their underlying data model makes it difficult to connect data scattered across multiple tables.</span></p>
<p><span>The graph databases we’ve seen emerge in the recent years are designed for this purpose. Their data model is particularly well-suited to store and to organize data where <a href=”https://linkurio.us/blog/unlocking-value-connected-data/”>connections are as important as individual data points</a>. Connections are stored and indexed as first-class citizens, making it an interesting model for investigations in which you need to connect the dots. In this post, we review three common fraud schemes and see how a graph approach can help investigators defeat them.</span></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837232028?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837232028?profile=RESIZE_710x” class=”align-center”/></a></p>
<h1><span>3 types of fraud graph analytics can combat<br/></span></h1>
<h2 id=”insurancefraud”>1) Insurance fraud</h2>
<p><span>Insurance fraud encompasses any act committed in the intent of defrauding an assurance process. It ranges from staged car accidents to faked deaths or exaggerated property damages. The FBI estimates that </span><a href=”https://www.fbi.gov/stats-services/publications/insurance-fraud”><span>insurance fraud cost $40 billion</span></a><span> per year in the U.S.</span></p>
<p><span>As an example, people frequently team up and put together fake road traffic accident (RTA) claims, in which they report hard-to-disprove, light, personal injuries. Those fraud rings involve several criminal playing the various roles of conductors, passengers, witnesses and even doctors that certify injuries, or accomplice lawyers that file the claim.</span></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837233894?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837233894?profile=RESIZE_710x” class=”align-center”/></a></p>
<p><span>There are too many claims filed every day for insurance analysts to analyze manually. Fraud investigation units have to rely on simple business rules to identify suspicious claims. But if the fraudsters made sure to avoid red flag case elements (unusual injury, recently purchased insurance policy, low velocity but significant injury etc) there is a chance they will go undetected and repeat the scheme.</span></p>
<p><span>This is where graph technology steps in. The graph approach brings data from various sources under a common model, so investigators can look at </span><i><span>all </span></i><span>the data at the same time, instead of isolated data silos. And this is exactly what they need because in these situations, what often gives away the fraudsters is abnormal connections to other elements.</span></p>
<p><span>These suspicious connections could be that the witness’s wife is connected to two similar cases, or that the doctor’s phone number is the same as the one of a conductor involved in another RTA claims, etc. Graph visualization and analysis platforms like Linkurious Enterprise allow investigators to pick up suspicious signs faster. They get a better understanding of the “big picture” and can identify abnormal connections to </span><a href=”https://linkurio.us/blog/whiplash-for-cash-using-graphs-for-fraud-detection/”><span>detect insurance fraud</span></a><span>.</span></p>
<p></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837238458?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837238458?profile=RESIZE_710x” class=”align-center”/></a></span></p>
<p><span><br/></span> <span>Above is an e</span><span>xample graph visualization where we can identify one of those abnormal patterns that indicate insurance fraud of staged car accidents: Two customers (blue nodes) filed three claims (green nodes). We can identify a network of three customers connected through personal information such as phone (brown nodes), email (pink nodes) with the same lawyer (green node) involved every time. It is likely they are recycling stolen or fake identity to file fraudulent claims.</span></p>
<h2 id=”creditcardfraud”><span>2) Payment card fraud</span></h2>
<p><span>Payment card fraud takes the form of criminals getting ahold of credit card information and proceeding to create unauthorized transactions. Card-present scenarios, in which criminals use a stolen or counterfeit credit card at an ATM or at the point-of-sale (POS) terminal of a physical store, affected </span><a href=”https://geminiadvisory.io/card-fraud-on-the-rise/”><span>45,8 million cards in the U.S</span></a><span> in 2018. Despite a massive migration to the safer chip-based card, stolen credit card fraud is still a major issue.</span></p>
<p><span>In a commonly encountered situation, a criminal proceeds the following way:</span></p>
<ul>
<li><span>set up skimming devices at ATM or gas pump to steal the details stored in card’s magnetic stripes;</span></li>
<li><span>replicate the stolen card information into a counterfeit card;</span></li>
<li><span>use to stolen cards to withdraw money at ATM, buy goods or gift cards at shops;</span></li>
<li><span>cardholders notice unusual activity on their bank account and notify the authority.</span></li>
</ul>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837240783?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837240783?profile=RESIZE_710x” class=”align-center”/></a></p>
<p></p>
<p><span>These situations are a perfect case for graph technology. While traditional technologies will hardly allow you to create a ‘big picture’ of heterogeneous data, the graph approach lets you collect the data in a model linking together: cardholders, transactions, terminals, and locations.</span></p>
<p><span>This way, when authorities are confronted with a surge of card-present fraud cases in a given region, graph technology can help </span><a href=”https://linkurio.us/blog/stolen-credit-cards-and-fraud-detection-with-neo4j/”><span>identify the common point of compromise</span></a><span> by highlighting the common links within the various reported cases, no matter how large the dataset is. Credit card fraud is thus another type of fraud graph analytics can help detect and fight.</span></p>
<p></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837247277?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837247277?profile=RESIZE_710x” class=”align-center”/></a></p>
<p></p>
<p><span>Above is an example of a graph visualization to identify a common point of compromise: Clients (blue nodes) report fraudulent purchases (orange nodes). We can identify through connections the common ATM (purple) where they made a withdrawal before the card was compromised.</span></p>
<h2 id=”vatfraud”><span>3) VAT fraud</span></h2>
<p><span>Carousel fraud, also known as the missing trader, or VAT fraud, is the theft of VAT collected on the sale of goods initially bought VAT-free in another jurisdiction. This scheme is difficult to identify in time and losses can be massive as recent cases have shown.</span></p>
<p><span>In 2018, a single </span><a href=”https://www.europol.europa.eu/newsroom/news/eu-wide-vat-fraud-organised-crime-group-busted”><span>VAT fraud ring cost more than 60 million euros</span></a><span> to the European economy. The criminal organization was selling products online through a wide network of shell companies and producing false invoices to perform VAT fraud. Generally, this is how the carousel works:</span></p>
<ul>
<li><span>Company A sells the goods company B VAT-free</span></li>
<li><span>Company B sells the goods to company C, charging the VAT</span></li>
<li><span>Company C sells the goods and claims a VAT refund to the tax agency of country A</span></li>
</ul>
<p></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837249246?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837249246?profile=RESIZE_710x” class=”align-center”/></a></p>
<p></p>
<p><span>Those schemes are intricate and transactions quickly come after one after another to avoid raising suspicion. To make sense of the layers behind which criminals hide, investigators need an overview of the situation. Once again, graph technology can help bring together various data types to get a better understand of the financial context.</span></p>
<p><span>Then, platforms like Linkurious Enterprise provide support for pattern finding activity, leveraging the flexible query semantic of graph databases. Investigators can search across vast data collections for patterns indicative of the carousel: for example multiple transactions occurring in a short amount of time between companies from two different countries with a newly created intermediary company. From there, investigators can monitor flagged patterns and </span><a href=”https://linkurio.us/blog/vat-fraud-mysterious-case-missing-trader/”><span>assess the existence of potential carousel fraud</span></a><span>.</span></p>
<p></p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/837251824?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/837251824?profile=RESIZE_710x” class=”align-center”/></a></p>
<p></p>
<p>Above is an example of a visualization to identify chains of transactions in VAT fraud: Companies (blue nodes) and their parent organizations (flags nodes) sell goods VAT-free and collect back VAT through complex layers of sales between EU and non-EU countries.</p>
<p></p>
<p id=”2627″ class=”graf graf–p graf-after–p”>Today, organizations use graph technology to fight fraud across activity sectors: insurance, banking, law enforcement or financial administrations. It is a complementary approach to traditional statistical and relational technologies because it gives the opportunity to look for clues within data connections, which is where the value often lies when it comes to fraud.</p>
<p id=”6f3f” class=”graf graf–p graf-after–p graf–trailing”>(Initially published on<span> </span><a href=”https://linkurio.us/blog/3-fraud-graph-analytics-help-defeat/” class=”markup–anchor markup–p-anchor” rel=”nofollow noopener” target=”_blank”>linkurio.us blog</a>)</p>
<p></p>



Mining Customer Reviews to drive Business Growth tag:www.analyticbridge.datasciencecentral.com,2019-01-24:2004291:BlogPost:390936
2019-01-24T22:30:00.000Z


Kaniska Mandal
https://www.analyticbridge.datasciencecentral.com/profile/KaniskaMandal

<p class=”p1″><span class=”s1″>A passionate customer always provides feedback about his favorite product if it touches his emotional chord.</span></p>
<p class=”p1″><span class=”s1″>Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.</span></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>Lets look into a specific example. …</span></p>


<p class=”p1″><span class=”s1″>A passionate customer always provides feedback about his favorite product if it touches his emotional chord.</span></p>
<p class=”p1″><span class=”s1″>Product review contains wealth of information. Analyzing the review texts can unearth many hidden data points about the customer and the product. Such insights can help grow the business and gain revenue.</span></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>Lets look into a specific example. </span></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>Our customer Bob decides to buy an edge pillow. </span></p>
<p class=”p1″><span class=”s1″><a href=”https://storage.ning.com/topology/rest/1.0/file/get/873092412?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/873092412?profile=RESIZE_710x” class=”align-left”/></a></span></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>He provides an in-depth feedback after using the pillow.</span></p>
<p class=”p1″><span class=”s1″><i>I have suffered with Gerd, Gastritis and Esophagitis for 1yr now and have been to several doctors and taken numerous medicine. All doctors told me to sleep on an incline and add blocks under my bed but I did not want to elevate both me and my wife so I slept on 3 pillows for over a year. Now I have arthritis in my neck and sleeping on 3 pillows have not done much to keep the acid down out of my throat. This wedge pillow does a good jo of not just elevating your head but it raises your entire upper abdomen to keep heartburn away from this area. I used to get up every night because of heartburn, bloating and stomach pain ……..</i></span></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>So what we learn when we read the whole text:</span></p>
<p class=”p1″><span class=”s1″><b>Our customer is not too Happy</b></span><span class=”s2″>☹</span> <span class=”s1″><b>.. but his Review comments provides interesting insights</b></span><span class=”s2″>☺</span></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>Lets now try to extract key signals and categorize them.</span></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>Health Concerns -&gt; <strong><i>‘</i></strong></span><span class=”s2″><strong>now my neck has become very stiff and painful</strong>’</span></p>
<p class=”p2″><span class=”s1″>Product Reference -&gt;<span class=”Apple-converted-space”> </span></span> <strong><span class=”s2″>Get Rolled-up Cheap Pillow</span></strong></p>
<p class=”p1″><span class=”s1″>Positive Feedback -&gt;</span> <strong><span class=”s2″>This pillow keeps food down and acid down</span></strong></p>
<p class=”p2″><span class=”s1″>Missing Feature -&gt;<span class=”Apple-converted-space”> </span></span> <span class=”s2″><b>does not have a steep incline</b></span></p>
<p class=”p2″></p>
<p class=”p2″>So it will be great if we can build a system to automatically extract such signals and share the insights through interactive visualization.</p>
<p class=”p2″>Quick high level view of the system components:</p>
<p class=”p2″><a href=”https://storage.ning.com/topology/rest/1.0/file/get/873197776?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/873197776?profile=RESIZE_710x” class=”align-left” style=”padding: 1px;”/></a></p>
<p class=”p2″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″><b>Technical Work Flow</b></span></p>
<ul>
<li class=”p1″><span class=”s1″><b>Ingest Review Streams<span class=”Apple-converted-space”> </span> (Real-time)<span class=”Apple-converted-space”> </span> [ Kafka -&gt; Spark ]</b></span></li>
<li class=”p1″><span class=”s1″><b>Store raw text in document index store for free form text search</b></span></li>
<li class=”p1″><span class=”s1″><b>Analyze incoming data asynchronously</b></span><ul>
<li class=”p1″>Text analysis [ NLP using Spark-ML ]<ul>
<li class=”p1″>Tokenize (lowercase, split)</li>
<li class=”p1″>Clean (remove stop word)</li>
<li class=”p1″>Normalize (lemmatize, stem)</li>
</ul>
</li>
<li class=”p1″>vectorize attributes and lookup historical vectorized data to run period NLP model training workflow</li>
<li class=”p1″>match significant product terms by referring to [Product Taxonomy ]</li>
<li class=”p1″>match buyer’s preference [Buyer’s Profile]</li>
<li class=”p1″>match medical terms [Medical Ontology and Vocabs]</li>
<li class=”p1″>discover new product , topics using LDA</li>
<li class=”p1″>detect positive features , negative features</li>
<li class=”p1″>sentiment analysis using VADER ( valence Aware Dictionary and Sentiment Reasoner)</li>
<li class=”p1″>enrich the results by combining with product rating , product attribute rating , review votes</li>
<li class=”p1″>extract and match  user interests</li>
<li class=”p1″>its very important to detect plagiarism and  </li>
</ul>
</li>
<li class=”p1″>Store current insights into Redis / DynamoDB for quick lookup and also stream to websockets</li>
<li class=”p2″><span class=”s1″><strong>Visualize real-time insights</strong></span></li>
<li class=”p2″><b>Historical analysis [ Elastic Search / Hadoop]</b><ul>
<li class=”p2″>periodically aggregate the above insights</li>
<li class=”p2″>refine product offering on historical insights</li>
<li class=”p2″>product popularity comparison by category</li>
<li class=”p2″>generate demand based on signals</li>
<li class=”p2″>recommend products based on attributes </li>
<li class=”p2″>find the hidden customers (channels / stores) and supply items to them need to buy in bulks</li>
<li class=”p2″>grow inventory and replenish items in local stores </li>
<li class=”p2″>customer retention through personalized offer based on what user liked and didn’t like</li>
<li class=”p2″>sell in bulk to channels discovered from product texts and offer discounted price</li>
<li class=”p2″>extract the health concerns and accordingly correlate with medical conditions , drugs info , safety warnings and generate health recommendation and aggregated health score</li>
</ul>
</li>
<li class=”p2″><span class=”s1″>Store aggregated and structured results in data warehouse cassandra or redshift </span></li>
<li class=”p2″>Visualize summary reports , insights and trends </li>
</ul>
<p class=”p3″></p>
<p class=”p1″><b>In order to extract above hidden patterns with correlated signals, we should implement the best possible mechanisms and Recurrent Neural Network </b></p>
<p class=”p1″><b><a href=”https://storage.ning.com/topology/rest/1.0/file/get/873403488?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/873403488?profile=RESIZE_710x” class=”align-left” style=”padding: 1px;” width=”238″ height=”243″/></a></b></p>
<p class=”p1″></p>
<p class=”p1″></p>
<p class=”p1″><span class=”s1″>Word Embeddings [1]</span></p>
<p class=”p1″><span class=”s1″>• Document vs. Word Representations</span></p>
<p class=”p1″><span class=”s1″>• Word2Vec vs Med2Vec</span></p>
<p class=”p1″><span class=”s1″>• GloVe</span></p>
<p class=”p1″><span class=”s1″>• Embeddings in Deep Learning</span></p>
<p class=”p1″><span class=”s1″>• Visualizing Word Vectors: tSNE</span></p>
<p class=”p1″><b> </b></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″><span class=”s2″>Valence Aware Dictionary and Sentiment Reasoner — can help evaluate </span><span class=”s1″>Buyer Sentiment Variations, positive/negative feedback ratio, feature attribute weightage enrichment and factor into all different types of product metrics computation (as explained above)</span></p>
<p class=”p2″><span class=”s1″>Finally we can generate incredibly useful visualizations and use them for product enhancement and improving overall buyer’s experience.</span></p>
<p class=”p2″><span class=”s1″>Lets get back to original feedback on wedge pillow and see the wonderful insights that we can gain.</span></p>
<p class=”p2″><span class=”s1″>Its noteworthy, how one can easily find the opportunity to sell wedge pillows to the rehabilitation center who ned them for their patients.</span></p>
<p class=”p2″><span class=”s1″>Many customers who actually buy the wedge pillows have undergone some sort of knee problems.</span></p>
<p class=”p2″></p>
<p class=”p2″>Just to understand the power of the knowledge that can be extracted from the reviews, lets quickly look into the insights gained from a set of feedbacks provided on ‘Cream of Wheat: Whole Grain Hot Cereal'</p>
<p class=”p2″><a href=”https://storage.ning.com/topology/rest/1.0/file/get/873576626?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/873576626?profile=RESIZE_710x” class=”align-left” style=”padding: 1px;”/></a></p>
<p class=”p2″></p>
<p class=”p2″><span class=”s1″>Its amazing to discover how this particular food item helps Alzheimer’s patients and mostly old people or persons with throat problems prefer this food item. </span></p>
<p class=”p2″></p>
<p class=”p2″><span class=”s1″><a href=”https://storage.ning.com/topology/rest/1.0/file/get/873583340?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/873583340?profile=RESIZE_710x” class=”align-left” style=”padding: 1px;”/></a></span></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″></p>
<p class=”p2″><span class=”s1″>Mining product review data can be real fun and can turn customer feedback into a continuous source of revenue.</span></p>



5 reasons why graph visualization matters tag:www.analyticbridge.datasciencecentral.com,2019-01-11:2004291:BlogPost:390485
2019-01-11T16:25:33.000Z


Elise Devaux
https://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux

<p>Why is graph visualization so important? How can it help businesses sifting through large amounts of complex data? We explore the answer in this post through 5 advantages of graph visualization and different use cases.</p>
<h1><span>What is graph visualization</span></h1>
<p><span>Also called network, a graph is a collection of nodes (or vertices) and edges (or links). Each node represents a single data point (a person, a phone number, a transaction) and each edge represents how two nodes…</span></p>


<p>Why is graph visualization so important? How can it help businesses sifting through large amounts of complex data? We explore the answer in this post through 5 advantages of graph visualization and different use cases.</p>
<h1><span>What is graph visualization</span></h1>
<p><span>Also called network, a graph is a collection of nodes (or vertices) and edges (or links). Each node represents a single data point (a person, a phone number, a transaction) and each edge represents how two nodes are connected (a person </span><i><span>possess </span></i><span>a phone number for example). This way of representing data is well suited for scenarios involving connections (social networks, telecommunication networks, protein interactions, and a lot more).</span></p>
<p><span>Graph visualization is the visual representation of the nodes and edges of a graph. Dedicated algorithms, called layouts, calculate the node positions and display the data on two (sometimes three) dimensional spaces. Graph visualization tools provide user-friendly web interfaces to interact and explore graph data.</span></p>
<div id=”attachment_5890″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”></p>
<p class=”wp-caption-text” style=”text-align: center;”><a href=”https://storage.ning.com/topology/rest/1.0/file/get/726169892?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/726169892?profile=RESIZE_710x” class=”align-center”/></a></p>
<p class=”wp-caption-text” style=”text-align: center;”>A simple graph visualization made with Linkurious Enterprise – 9 nodes representing investors (blue), companies (green) and market (orange) and 8 edges indicating how they are connected.</p>
</div>
<p><span><br/>These graph visualizations are simply visualizations of data modeled as graphs. Any type of data asset that contains information about connections can be modeled and visualized as a graph, even data initially stored in a tabular way. For instance, the data from our example above could be extracted from a simple spreadsheet as depicted below.<br/><br/></span></p>
<table style=”margin-left: auto; margin-right: auto;”>
<tbody><tr><td><span>Company ID</span></td>
<td><span>Company name</span></td>
<td><span>Investors name</span></td>
<td><span>Market</span></td>
</tr>
<tr class=”alt-table-row”><td><span>1</span></td>
<td><span>Systran</span></td>
<td><span>Softbank Ventures Korea</span></td>
<td><span>Software</span></td>
</tr>
<tr><td><span>2</span></td>
<td><span>Exakis</span></td>
<td><span>Naxicap Partners; IRDI-ICSO; IRDI Midi Pyrenees</span></td>
<td><span>Software</span></td>
</tr>
<tr class=”alt-table-row”><td><span>3</span></td>
<td><span>Voluntis</span></td>
<td><span>Qualcomm</span></td>
<td><span>Software</span></td>
</tr>
</tbody>
</table>
<p style=”text-align: center;”><em><span>A table-based model of our first example</span></em></p>
<p><span><br/>The data could also be stored in a relational database or in a graph database, a system </span><a href=”https://neo4j.com/why-graph-databases/”><span>optimized for the storage and analysis of complex and connected data</span></a><span>.</span></p>
<p><span>In the end, graph visualization is a way to better understand and manipulate connected data. And it offers several advantages.  </span></p>
<h2><span style=”font-size: 18pt;”>The benefits of graph visualization</span></h2>
<p></p>
<p><span>Interactive visualization tools are an essential layer to identify insights and generate value from connected data. There are a number of reasons why graph visualization is useful:</span></p>
<ol>
<li><span>You will</span><b><span> </span>spend less time assimilating information</b><span> because the human brain processes visual information much faster than written one. Visually displaying data ensures a faster comprehension which, in the end, reduces the time to action.<br/></span></li>
<li><span>You have a</span><b><span> </span>higher chance to discover insights</b><span> by interacting with data. Graph visualization tools offer the possibility to manipulate the data. It encourages data appropriation, its questioning and in the end increases the possibility to discover actionable insights. </span><a href=”https://www.tableau.com/sites/default/files/media/8604-ra-business-intelligence-analytics.pdf”><span>A study showed</span></a><span> that managers who use visual data discovery tools are 28% more likely to find timely information, than those who rely solely on managed reporting and dashboards.<br/><br/></span></li>
<li><span>You can achieve a</span><b><span> </span>better understanding of a problem</b><span> by visualizing patterns and context. Graph visualization tools are perfect to visualize relationships but also to understand the context of the data. You get a complete overview of how everything is connected which allows to identify trends and correlations in your data.<br/><br/></span></li>
<li><b>It’s an effective form of communication</b><span>. Visual representations offer a more intuitive way to understand the data and are an impactful medium to share your findings with decision-makers.<br/><br/></span></li>
<li><b>Everybody can work with graph visualization</b><span>, not only technical users. More users can access the insights since specific programming skills are not required to interact with graph visualizations. This increases the value creation potential.<br/><br/></span></li>
</ol>
<p><span>Let’s illustrate some of these benefits with a very simple example. We have a data sample of eleven individuals with information about who works with who. Below is the same data sample in two formats: a table and a graph visualization.<br/><br/></span></p>
<p style=”text-align: center;”><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/726173371?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/726173371?profile=RESIZE_710x” class=”align-center”/></a></span></p>
<div id=”attachment_5898″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”>Table of our data sample (click for full view)</p>
<p class=”wp-caption-text” style=”text-align: center;”></p>
<p class=”wp-caption-text” style=”text-align: center;”><a href=”https://storage.ning.com/topology/rest/1.0/file/get/726175663?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/726175663?profile=RESIZE_710x” class=”align-center”/></a></p>
</div>
<div id=”attachment_5892″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”>Graph visualization of our data sample (click for full view)</p>
</div>
<p><span><br/>In our second format, we’ve modeled the connections between persons as edges to obtain a graph.<br/> While in the first table it’s pretty hard to understand how those people work together, we get a clearer view with the graph visualization. We are able to distinguish two groups and an individual who seems to be the link between them, a pattern that we did not notice at first in the table.</span></p>
<p></p>
<h2><span style=”font-size: 18pt;”>How graph visualization is being used</span></h2>
<p></p>
<p><span><a href=”https://linkurio.us/blog/category/use-case/”>Many industries are using graph technology</a> to leverage their connected data and reach their goals. At Linkurious, we work with companies from a large variety of fields. Their common point, however, is the need to</span><b><span> </span>find connections or understand dependencies</b><span> within their data. Below are a few examples of typical use-cases of graph visualization and the organizations who use it.<br/></span></p>
<p></p>
<p><b>Anti-Financial crime</b></p>
<p><span>Banks, insurance companies, and financial institutions have a common urgency to face: fraud. From money laundering to insurance fraud to bank fraud, each of these organizations is required to detect, sometimes complex, fraud schemes. The data visualized often combine customer information, claims details, financial records, watch-listed individuals or organizations. For them, graph visualization is a good way to detect suspicious connections or patterns. It’s also an intuitive way to investigate fraud rings and criminals networks ramifications.</span></p>
<p></p>
<p><b>Cybersecurity</b></p>
<p><span>Today you’ll find cyber, or IT, security in many large organizations, financial institutions, and security consultancy services. Organizations need to protect themselves from vulnerabilities like zero-day vulnerabilities and DDoS or phishing attacks. They collect data from servers, routers or application logs and network status in order to detect suspicious activity. Graph visualization is a great tool to digest this data and detect suspicious patterns in a glimpse. It makes the finding of compromised elements easier thanks to the visual exploration of connections.</span></p>
<p></p>
<p><b>Intelligence</b></p>
<p><span>Almost every government has its intelligence agency. To support law enforcement, national security or military objectives, these organizations collect and analyze data from various sources. The detection and identification of terrorist networks, for instance, became a crucial objective in the past decades. Visualizing connections between people, emails, transactions or phone records is a key to ease such investigations.  </span></p>
<p></p>
<p><b>IT operations management</b></p>
<p><span>The field of IT operations management keeps growing with our increasing reliance on computer systems, networks and the growth of the Internet of Things. But because of the growing complexity of infrastructures, managing networks is often a challenge. Graph visualization allows IT managers to visualize dependencies between their assets (servers, switches, routers, applications, etc). It’s an intuitive way to perform impact or root cause analysis.</span></p>
<p></p>
<p><b>Enterprise architecture</b></p>
<p><span>Numerous mature organizations implement enterprise architecture management. It consists of synchronizing business and IT data. The goal is to analyze, plan and transform the business processes, applications, data and infrastructure to maintain the organization ability to change and innovate. With graph visualization, enterprise architects can visualize the organization assets and their dependencies. It helps to conduct impact analysis, obtain insights on the current situation (as-is) and plan the right actions.</span></p>
<p></p>
<p><b>Life science</b></p>
<p><span>Protein interactions, drug compositions, disease networks: for life science data analysis almost everything is about connections and dependencies. However, the large amount of data often makes it difficult for researchers to identify insights and look for dependencies. Graph visualization makes large amounts of data more accessible and easier to read. It has many different applications, from linking drugs with adverse events and diseases with phenotypes to visualizing network or understand how diseases spread.</span></p>
<p></p>
<p><span>This article was initially posted on <a href=”https://linkurio.us/blog/why-graph-visualization-matters/” target=”_blank” rel=”noopener”>Linkurious blog</a></span></p>



New Books in AI, Machine Learning, and Data Science tag:www.analyticbridge.datasciencecentral.com,2018-12-02:2004291:BlogPost:389661
2018-12-02T01:26:14.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. In the upcoming months, the following will be added:</p>
<ul>
<li>The Machine Learning Coding Book</li>
<li>Off-the-beaten-path Statistics and Machine Learning Techniques </li>
<li>Encyclopedia of Statistical Science</li>
<li>Original Math, Stat and Probability Problems – with…</li>
</ul>


<p>We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. In the upcoming months, the following will be added:</p>
<ul>
<li>The Machine Learning Coding Book</li>
<li>Off-the-beaten-path Statistics and Machine Learning Techniques </li>
<li>Encyclopedia of Statistical Science</li>
<li>Original Math, Stat and Probability Problems – with Solutions</li>
<li>Computational Number Theory for Data Scientists</li>
<li>Randomness, Pattern Recognition, Simulations, Signal Processing – New developments</li>
</ul>
<p>We invite you to<span> </span><a href=”https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter” target=”_blank” rel=”noopener”>sign up here</a><span> </span>to not miss these free books.  Previous material (also for members only) can be found<span> </span><a href=”https://www.datasciencecentral.com/page/member” target=”_blank” rel=”noopener”>here</a>.</p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/135807237?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/135807237?profile=original” class=”align-center”/></a></p>
<p></p>
<p>Currently, the following content is available:</p>
<p><strong>1. Book: Enterprise AI – An Application Perspective</strong> </p>
<p>Enterprise AI: An applications perspective takes a use case driven approach to understand the deployment of AI in the Enterprise. Designed for strategists and developers, the book provides a practical and straightforward roadmap based on application use cases for AI in Enterprises. The authors (Ajit Jaokar and Cheuk Ting Ho) are data scientists and AI researchers who have deployed AI applications for Enterprise domains. The book is used as a reference for Ajit and Cheuk’s new course on Implementing Enterprise AI.</p>
<p>The table of content is available<span> </span><a href=”https://www.datasciencecentral.com/profiles/blogs/free-ebook-enterprise-ai-an-applications-perspective” target=”_blank” rel=”noopener”>here</a>.  The book can be accessed<span> </span><a href=”https://www.datasciencecentral.com/page/free-books-1″ target=”_blank” rel=”noopener”>here</a><span> </span>(members only.)</p>
<p><strong>2. Book: Applied Stochastic Processes</strong></p>
<p>Full title:<span> </span><em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em>. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)</p>
<p>This book is intended to professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</p>
<p>New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.</p>
<p>The table of content is available<span> </span><a href=”https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes” target=”_blank” rel=”noopener”>here</a>. The book can be accessed<span> </span><a href=”https://www.datasciencecentral.com/page/free-books-1″ target=”_blank” rel=”noopener”>here</a><span> </span>(members only.)</p>
<p><span><b>DSC Resources</b></span></p>
<ul>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources”>Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles”>Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a><span> </span>|<span> </span><a href=”https://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a><span> </span>|<span> </span><a href=”http://www.analytictalent.com/”>Find a Job</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a><span> </span>|<span> </span><a href=”https://www.datasciencecentral.com/forum/topic/new”>Forum Questions</a></li>
</ul>



Semantic Roles according to Word2Vec tag:www.analyticbridge.datasciencecentral.com,2018-05-07:2004291:BlogPost:383611
2018-05-07T06:00:00.000Z


Rosaria Silipo
https://www.analyticbridge.datasciencecentral.com/profile/RosariaSilipo

<p></p>
<p><em>Figure 1. Scatter plot of word embedding coordinates (coordinate #3 vs. coordinate #10). You can see that semantically related words are close to each other.</em></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220287711?profile=original” target=”_self”><img class=”align-center” src=”http://storage.ning.com/topology/rest/1.0/file/get/2220287711?profile=original” width=”590″></img></a></p>
<p>This blog post is an extract from chapter 6 of the book “<span><a href=”https://www.knime.com/knimepress/from-words-to-wisdom”><strong>From Words to Wisdom. An Introduction to Text Mining…</strong></a></span></p>


<p></p>
<p><em>Figure 1. Scatter plot of word embedding coordinates (coordinate #3 vs. coordinate #10). You can see that semantically related words are close to each other.</em></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220287711?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220287711?profile=original” class=”align-center” width=”590″/></a></p>
<p>This blog post is an extract from chapter 6 of the book “<span><a href=”https://www.knime.com/knimepress/from-words-to-wisdom”><strong>From Words to Wisdom. An Introduction to Text Mining with KNIME</strong></a></span>” by V. Tursi and R. Silipo, published by the <span><a href=”https://www.knime.com/knimepress”>KNIME Press</a></span>.  A more detailed version of this post is also available on the KNIME blog: “<span><a href=”https://www.knime.com/blog/word-embedding-word2vec-explained”>Word Embedding: Word2Vec Explained</a></span>”.</p>
<p></p>
<h2>Word2Vec Embedding</h2>
<p>Word embedding, like document embedding, belongs to the text preprocessing phase. Specifically, to the part that transforms a text into a row of numbers.</p>
<p>In the <span><a href=”https://www.knime.com/knime-text-processing”>KNIME Text Processing extension</a></span>, the Document Vector node transforms a sequence of words into a sequence of 0/1 – or frequency numbers – based on the presence/absence of a certain word in the original text. This is also called “one-hot encoding”. One-hot encoding though has two big problems:</p>
<ul>
<li>it produces a very large data table with the possibility of a large number of columns;</li>
<li>it produces a very sparse data table with a very high number of 0s, which might be a problem for training certain machine learning algorithms.</li>
</ul>
<p>The Word2Vec technique was therefore conceived with two goals in mind:</p>
<ul>
<li>reduce the size of the word encoding space (embedding space);</li>
<li>compress in the word representation the most informative description for each word.</li>
</ul>
<p>Given a context and a word related to that context, we face two possible problems:</p>
<ul>
<li>from that context, predict the target word (Continuous Bag of Words or CBOW approach)</li>
<li>from the target word, predict the context it came from (Skip-gram approach)</li>
</ul>
<p></p>
<p>The <strong>Word2Vec technique</strong> is based on a feed-forward, fully connected architecture [1] [2] [3], where the context or the target word is presented at the input layer and the target word or the context is predicted at the output layer, depending on the selected approach. The output of the hidden layer is taken as a representation of the input word/context instead of the one-hot encoding representation. See KNIME blog post “<span><a href=”https://www.knime.com/blog/word-embedding-word2vec-explained”>Word Embedding: Word2Vec Explained</a></span>” for more details.</p>
<h2>Representing Words and Concepts with Word2Vec</h2>
<p>In KNIME Analytics Platform, there are a few nodes which deal with word embedding.</p>
<ul>
<li>The <strong>Word2Vec Learner</strong> node encapsulates the Word2Vec Java library from the <span><a href=”https://deeplearning4j.org/word2vec”>DL4J</a></span> It trains a neural network for either CBOW or Skip-gram. The neural network model is made available at the node output port.</li>
</ul>
<p></p>
<ul>
<li>The <strong>Vocabulary Extractor</strong> node runs the network on all vocabulary words learned during training and outputs their embedding vectors.</li>
</ul>
<p></p>
<ul>
<li>Finally, the <strong>Word Vector Apply</strong> node tokenizes all words in a document and provides their embedding vectors as generated by the Word2Vec neural network at its input port. The output is a data table where words are represented as sequences of numbers and documents are represented as sequences of words.</li>
</ul>
<p></p>
<p>The whole intuition behind the Word2Vec approach consists of representing a word based on its context. This means that words appearing in similar contexts will be similarly embedded. This includes synonyms, opposites, and semantically equivalent concepts. In order to verify this intuition, we built a workflow in KNIME Analytics Platform. The workflow is available for free download from the KNIME EXAMPLES server under:</p>
<p><em>08_Other_Analytics_Types/01_Text_Processing</em>/<em>21_Word_Embedding_Distance</em>.</p>
<p> </p>
<p>In this workflow we train a Word2Vec model on 300 scientific articles from <span><a href=”https://www.ncbi.nlm.nih.gov/pubmed/”>PubMed</a></span>. One set of articles has been extracted using the query “mouse cancer” and one set of articles using the query “human AIDS”.</p>
<p></p>
<p>After reading the articles, transforming them into documents, and cleaning up the texts in the “Pre-processing” wrapped metanode, we train a Word2Vec model with the Word2Vec Learner node. Then we extract all words from the model dictionary and we expose their embedding vectors, with a Vocabulary Extractor node. Finally, we calculate the Euclidean distances among vector pairs in the embedding space.</p>
<p></p>
<p>The figure at the top of this post shows some results of this workflow, i.e. the positioning of some of the dictionary words in the embedding space using an interactive scatter plot. For the screenshot above, we chose embedding coordinates #3 and #10.</p>
<p></p>
<p>In the embedding coordinate plot, “cancer” and “tumor” are very close, showing that they are often used as synonyms. Similarly “AIDS” and “HIV” are also very close, as it was to be expected. Notice that “mouse” is in between “AIDS”, “cancer”, “tumor”, and “HIV”. This is probably because most of the articles in the data set describe mice related findings for cancer and AIDS. The word “patients”, while still close to the diseases, is further away than the word “mouse”. Finally, “women” are on the opposite side of the plot, close to the word “breast”, which is also plausible. From this small plot and small dataset, the adoption of word embedding seems promising.</p>
<p></p>
<p><strong>Note.</strong> All disease related words are very close to each other, like for example “HIV” and “cancer”. Even if the words refer to different diseases and different concepts, like “HIV” and “cancer”, they are still the topic of most articles in the dataset. That is, from the point of view of semantic role, they could be considered equivalent and therefore end up close to each other in the embedding space.</p>
<p></p>
<p>Continuing to inspect the word pairs with smallest distance, we find that “condition” and “transition” as well as “approximately” and “determined” are the closest words. Similarly, unrelated words such as “sciences” and “populations”, “benefit” and “wide”, “repertoire” and “enrolled”, “rejection” and “reported”, “Cryptococcus” and “academy”, are located very closely in the embedding space.</p>
<p></p>
<h2>References</h2>
<p>[1] Le Q., Mikolov T. (2014) <span><a href=”https://cs.stanford.edu/~quocle/paragraph_vector.pdf”>Distributed Representations of Sentences and Documents</a></span>, Proceedings of the 31<sup>st</sup> International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&amp;CP volume 32.</p>
<p>[2] Analytics Vidhya (2017), <span><a href=”https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/”>An Intuitive Understanding of Word Embeddings: From Count Vectors to Word2Vec</a></span></p>
<p>[3] McCormick, C. (2016, April 19). <em>Word2Vec Tutorial – The Skip-Gram Model</em>. Retrieved from <span><a href=”http://www.mccormickml.com”>http://www.mccormickml.com</a></span> <span><a href=”http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/”>http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/</a></span></p>



Finding insights with graph analytics tag:www.analyticbridge.datasciencecentral.com,2018-10-04:2004291:BlogPost:388969
2018-10-04T15:30:00.000Z


Elise Devaux
https://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux

<p><span>From detecting anomalies to understanding what are the key elements in a network, or highlighting communities, graph analytics reveal information that would otherwise remain hidden in your data. We will see how to integrate your graph analytics with Linkurious Enterprise to detect and investigate insights in your connected data.</span></p>
<p><span id=”more-6665″></span></p>
<h2><span>What is graph analytics</span></h2>
<h3><span>Definition and…</span></h3>


<p><span>From detecting anomalies to understanding what are the key elements in a network, or highlighting communities, graph analytics reveal information that would otherwise remain hidden in your data. We will see how to integrate your graph analytics with Linkurious Enterprise to detect and investigate insights in your connected data.</span></p>
<p><span id=”more-6665″></span></p>
<h2><span>What is graph analytics</span></h2>
<h3><span>Definition and methods</span></h3>
<p></p>
<p><span>Graph analytics is a set of tools and methods aiming at extracting knowledge from data modeled as a graph. The graph paradigm is ideal to make the best out of connected data</span><span>, which value resides for the most part in its relationships. But even with data modeled as a graph, extracting knowledge and providing insights can be challenging. Faced with multi-dimensional data and very large datasets, analysts need tools to accelerate the discovery of insights.</span></p>
<p></p>
<p><span>The field of graph theory has spawned multiple algorithms that analysts can rely on to find insights hidden in graph data. Below are the some of the popular graph algorithms and how they can help find insights for use-cases such as fraud, network management, anti-money, intelligence analysis or cybersecurity:</span></p>
<p></p>
<ul>
<li><b>Pattern matching algorithms<span> </span></b><span>allow to identify one or several subgraphs with a given structure within a graph. Example: A company node with the country property containing “Luxembourg” connected to at least five officer nodes with a registered address in France.</span></li>
<li><b>Traversal and pathfinding algorithms<span> </span></b><span>determine paths between nodes within the graph, without knowing what connections exist or how many of them separate the two nodes. In money laundering investigations, path analysis can help determine how money flows through a network of individuals, how it goes from company A to person B. Example: the <a href=”https://en.wikipedia.org/wiki/Shortest_path_problem”>shortest path algorithm.</a></span></li>
<li><b>Connectivity algorithms<span> </span></b><span>find the minimum number of nodes or edges that need to be removed to disconnect the remaining nodes from each other. It is helpful to determine weaknesses in an IT network for instance and find out which infrastructure points are sensitive and can take it down. Example: the <a href=”https://en.wikipedia.org/wiki/Strongly_connected_component”>Strongly Connected Components algorithm</a></span></li>
<li><b>Community detection algorithms</b><span> identify clusters or groups, of nodes densely connected within the graph. This is particularly helpful to find groups of people that might belong to a common criminal organization. Example: the <a href=”https://en.wikipedia.org/wiki/Louvain_Modularity”>Louvain method</a>, the label propagation algorithm.</span></li>
<li><b>Centrality algorithms</b><span> determine a node’s relative importance within a graph by looking at how connected it is to other nodes. It is used for instance to identify key people within organizations. Example: the <a href=”https://en.wikipedia.org/wiki/PageRank”>PageRank algorithm</a>, degree centrality, closeness centrality, betweenness centrality</span></li>
</ul>
<h3><span>Architecture blueprint for graph analytics</span></h3>
<p></p>
<p><span>Depending on your data, your use-case, and the questions you have to answer, technology and infrastructure can differ from one organization to another. But a generic graph analytics architecture usually consists of the following layers:</span></p>
<p></p>
<ul>
<li><span><strong>Linkurious Enterprise</strong>: the browser-based platform and its server are used by investigation teams to  visualize and analyze graph data. It retrieves data in real-time from graph databases.</span></li>
<li><span><strong>Graph databases</strong>: transactional systems storing data as graphs and managing operations such as data retrieval or writing. They perfectly handle real-time queries, making them great online transaction processing (OLTP) systems.</span></li>
<li><span><strong>Graph processing systems</strong>: a set of analytical engines shipping with common graph algorithms and handling large-scale online analytical processing (OLAP) on graphs.</span></li>
</ul>
<div id=”attachment_6667″ class=”wp-caption aligncenter”><img class=”lazy size-full wp-image-6667 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/09/data_processing.jpg” alt=”graph analytics Linkurious schema” width=”738″ height=”508″/><p class=”wp-caption-text”>Architecture blueprint for graph analytics</p>
<p class=”wp-caption-text”></p>
</div>
<p><span>Linkurious Enterprise acts as a front-end where analysts and investigators can easily retrieve information. The data accessed by Linkurious Enterprise is stored in a graph database. Graph databases are well suited for real-time querying and long-term persistence but are usually not designed for running complex graph algorithms at scale. As a result, our clients tend to push this sort of workload to dedicated graph processing frameworks such as <a href=”http://spark.apache.org/”>Spark</a>/<a href=”https://spark.apache.org/graphx/”>GraphX</a>. The results are then persisted back in the graph database as new properties (eg a PageRank score property for example) and thus become available to Linkurious Enterprise.</span></p>
<h2><span>Applying graph analytics to the Paradise Papers data</span></h2>
<p></p>
<p><span>In this section, we take a closer look at a real-life graph dataset, the </span><a href=”https://offshoreleaks.icij.org/pages/database”><span>Paradise Papers dataset</span></a><span>, created by the ICIJ to <a href=”https://linkurio.us/blog/big-data-technology-fraud-investigations/”>investigate the world offshore finance industry</a>. We use Linkurious Enterprise to query, analyze and visualize the data using graph analytics tools and methods.</span></p>
<p></p>
<h3><span>The setup</span></h3>
<div id=”attachment_6669″ class=”wp-caption aligncenter”><img class=”lazy wp-image-6669 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/09/data_processing_2.png” alt=”Linkurious graph analytics” width=”744″ height=”553″/><p class=”wp-caption-text”>The setup used in our example</p>
<p class=”wp-caption-text”></p>
</div>
<p><span>For the purpose of this example, we relied on the architecture pictured above:</span></p>
<ul>
<li><span>A Linkurious Enterprise instance</span></li>
<li><span>A <a href=”https://linkurio.us/solution/neo4j/”>Neo4j graph database</a></span></li>
<li><span>The </span><a href=”https://neo4j.com/developer/graph-algorithms/”><span>Neo4j graph algorithms</span></a><span> library, a plugin that provides parallel versions of common graph algorithms for Neo4j exposed as Cypher procedures.</span></li>
</ul>
<h3><span>The Paradise Papers dataset</span></h3>
<p></p>
<p><span>The dataset is made of 1,582,953 nodes and 2,398,680 edges. It aggregates data from four investigations of the ICIJ: the Offshore Leaks, the Panama Papers, the Bahamas Leaks and the Paradise Papers.</span></p>
<p></p>
<p><span>The graph data model has four types of nodes and three types of edges as depicted below.</span></p>
<p></p>
<div id=”attachment_6672″ class=”wp-caption aligncenter”><img class=”lazy wp-image-6672 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/09/data_model.png” alt=”Paradise papers linkurious” width=”543″ height=”355″/><p class=”wp-caption-text”>Graph data model of the Paradise Papers dataset</p>
<p class=”wp-caption-text”></p>
</div>
<p><span>In the following sections, we will see how to use different graph analytics approaches such as graph pattern matching, PageRank analysis, and the Louvain community detection method. While implementing graph analytics requires some technical knowledge, we will see how Linkurious Enterprise can make graph analytics results accessible to every analyst via simple tools. Among these tools are query templates, an alert dashboard, and a visualization interface.</span></p>
<p></p>
<h3><span>Graph pattern matching in Linkurious Enterprise</span></h3>
<p></p>
<p><span>A simple method for identifying patterns in a graph is to use graph languages to describe the shape of the data you are looking for. As a developer, you can do it in the interface of your favorite graph database but also within the Linkurious Enterprise interface.</span></p>
<p></p>
<p><span>What if you want to be warned every time a certain graph pattern appears in your data? Via the Linkurious Enterprise alert system, you set up alerts for graph patterns you want to monitor. Every time a new match is detected in the database, it’s recorded and available for users to review. This is useful in a fraud monitoring context for instance where you’d want to be notified when instances of known fraud schemes occur.</span></p>
<p></p>
<p><span>In the video below, we set up a new alert in Linkurious Enterprise for a specific pattern. The alert contains a graph query looking for addresses tied to more than five entities or company officers.</span></p>
<p></p>
<p><iframe src=”https://www.youtube.com/embed/A2-7xAg_3ug?wmode=opaque” width=”560″ height=”315″ frameborder=”0″ allowfullscreen=”allowfullscreen”></iframe>
</p>
<p></p>
<p><span>Once the alert is saved, users access a match list and can start investigating the results. Below, we review one of the findings from the alert investigation interface. </span></p>
<p></p>
<p><iframe src=”https://www.youtube.com/embed/zmEd_J3iq-M?wmode=opaque” width=”560″ height=”315″ frameborder=”0″ allowfullscreen=”allowfullscreen”></iframe>
</p>
<p></p>
<p><span>When looking at a node representing a company, you may want to know what are all the other companies it is sharing the same addresses with. The answer can be retrieved manually, by expanding and filtering the data. Or it can be retrieved via a graph query, which requires technical skills. With Linkurious Enterprise’ query templates, you can apply pre-formatted graph queries with the click of a button and accelerate your data exploration. Users run query templates by right-clicking on a node in the visualization and choosing the desired template from the menu. </span></p>
<p></p>
<p><span>Below is an example of how to set up a query template. We configure it to retrieve, for a given company officer, all the other officers it is connected to via a shared address or a shared company.</span></p>
<p></p>
<p><iframe src=”https://www.youtube.com/embed/-EfMaVCoAZU?wmode=opaque” width=”560″ height=”315″ frameborder=”0″ allowfullscreen=”allowfullscreen”></iframe>
</p>
<p></p>
<p><span>Once the query is configured, users can easily access and run it from the visualization interface to speed up their investigations.</span></p>
<p></p>
<p><iframe src=”https://www.youtube.com/embed/ySkVS3FRHS8?wmode=opaque” width=”560″ height=”315″ frameborder=”0″ allowfullscreen=”allowfullscreen”></iframe>
</p>
<p></p>
<p><span>In addition to these features, users can rely on Linkurious Enterprise styling and filtering capabilities to analyze the data faster. Once the results of the query are displayed, styles and filters are essential to refine the results, reduce the noise and highlight the key elements.</span></p>
<p></p>
<p><span>In the next section, we see how to automate the identification of unusual companies within the French network using the PageRank algorithm and Linkurious Enterprise’s alert system.</span></p>
<p></p>
<h3><span>Identifying key nodes with the PageRank algorithm</span></h3>
<p></p>
<p><span>To use graph algorithms in Linkurious Enterprise, you will first need to run them on your backend and save their results as new properties in your graph database. In this example, we show how to identify key nodes in your network using the PageRank algorithm. This centrality algorithm will compute a score assessing the relative importance of various nodes within a network.</span></p>
<p><span>One line of code is enough to run the algorithm in Neo4j and create a new node property, “pagerank_g” with the resulting PageRank score.</span></p>
<p></p>
<table>
<tbody><tr><td><span>// Computation of PageRank<br/></span> <span>CALL algo.pageRank(null,null,)</span></td>
</tr>
</tbody>
</table>
<p></p>
<p><span>Once this has been added to our graph, we can start exploiting the results in Linkurious Enterprise.</span></p>
<p><span>We created a new alert, leveraging the PageRank results. The query is simple: it searches for Entity nodes connected to other nodes (Countries, Officer, Intermediary) located in France. It also collects their PageRank scores and ranks them by order of importance. Every matching sub-graph is recorded by the alert system and can be investigated. By sorting results by their PageRank scores, we can focus our investigation on the most important companies within the French network.</span></p>
<p></p>
<table>
<tbody><tr class=”alt-table-row”><td><span>// Detect French entities with a high PageRank</span><p></p>
<p><span>MATCH (a:Entity)-[r]-(b)<br/></span> <span>WHERE b.countries = “France”<br/></span> <span>WITH a.pagerank as score, a, COLLECT( distinct r) as r, COLLECT( distinct b) as b, count(b) as degree<br/></span> <span>RETURN a, score, a.name as name, r, b, degree<br/></span> <span>ORDER BY score DESC</span></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p><span>In the example below, we review one of the top matches recorded by the alert system. </span></p>
<p></p>
<p><iframe src=”https://www.youtube.com/embed/J2ARFuykM_A?wmode=opaque” width=”560″ height=”315″ frameborder=”0″ allowfullscreen=”allowfullscreen”></iframe>
</p>
<p></p>
<p><span>In addition to these features, users can rely on Linkurious Enterprise styling and filtering capabilities to analyze the data faster. For instance, it’s possible to size and filter the nodes based on their PageRank score to get a faster understanding of the situations as depicted in the image below.</span></p>
<p></p>
<div id=”attachment_6688″ class=”wp-caption aligncenter”><img class=”lazy size-full wp-image-6688 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/09/sizing.png” alt=”style and analytics” width=”941″ height=”452″/><p class=”wp-caption-text”><br/> A size is applied to “location” nodes based on their PageRank score to highlight nodes of importance.</p>
</div>
<p><span>By enriching the data with additional information, the PageRank algorithm helped us focus on nodes of interest. The alert system in Linkurious Enterprise helps us classify the results and provides a user-friendly interface for investigation. In the next section, we see how to detect community of interest with a single click using the Louvain algorithm and the query template system.</span></p>
<h3><span>Identifying interesting communities via the Louvain modularity</span></h3>
<p></p>
<p><span>In the example below, we implement the Louvain algorithm to identify communities within our network. We look specifically at communities of company officers based on their relationships. The snippet of code below identifies communities and adds a new property “communityLouvain” property to each node, representing the community it belongs to.</span></p>
<p></p>
<table>
<tbody><tr><td><span>// Computation of Louvain modularity</span><p></p>
<p><span>CALL algo.louvain(<br/></span> <span> ‘MATCH (p:Officer) RETURN id(p) as id’,<br/></span> <span> ‘MATCH (p1:Officer)-[:OFFICER_OF]-&gt;(:Entity)&lt;-[:OFFICER_OF]-(p2:Officer)<br/></span> <span>  RETURN id(p1) as source, id(p2) as target’,<br/></span> <span> );</span></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p><span>Then, we leverage the data generated by the algorithm in a query template to retrieve in a click for a given “Officer” node, the other officers belonging to the same community. Instead of manually exploring each of the nodes’ </span><span>neighbors to </span><span>identify a potential community, the query template instantly provides an answer the analysts can then refine. Below is the code used in the query template.</span></p>
<p></p>
<table>
<tbody><tr class=”alt-table-row”><td><span>//Retrieve the officer nodes who belong to the same community</span><p></p>
<p><span>MATCH (a:Officer)<br/></span> <span>WHERE ID(a) = }<br/></span> <span>WITH a<br/></span> <span>MATCH p = (a:Officer)-[*..4]-(b:Officer)<br/></span> <span>WHERE a.communityLouvain = b.communityLouvain<br/></span> <span>RETURN p</span></p>
</td>
</tr>
</tbody>
</table>
<p></p>
<p><span>We can now retrieve, in a click, officers of the same community from any given officer in the visualization interface. In the example below, we apply this to Boris Rotemberg, a Russian oligarch, opening an investigation on his close connections. Once the results of the query are displayed, styles and filters are essential to refine the results, reduce the noise and highlight the key elements.</span></p>
<p></p>
<p><iframe src=”https://www.youtube.com/embed/ZlO-4Kif1Bo?wmode=opaque” width=”560″ height=”315″ frameborder=”0″ allowfullscreen=”allowfullscreen”></iframe>
</p>
<p></p>
<p><span>Graph analytics and graph visualization are complementary. The existing graph analytics tools and methods make it possible to extract information from large amounts of connected data, generating valuable insights.</span></p>
<p></p>
<p>With platforms like Linkurious Enterprise, every user can take advantage of graph analytics from their browser via an intuitive interface. From detecting financial crimes, such as money laundering or tax evasion, to spotting fraud, or fighting organized crime, analysts find the insights they need.</p>



Free Book: Applied Stochastic Processes tag:www.analyticbridge.datasciencecentral.com,2018-09-08:2004291:BlogPost:388037
2018-09-08T17:16:14.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p><span>Full title: <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em>. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)</span></p>
<p><span>This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to…</span></p>


<p><span>Full title: <em>Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems</em>. Published June 2, 2018. Author: Vincent Granville, PhD. (104 pages, 16 chapters.)</span></p>
<p><span>This book is intended for professionals in data science, computer science, operations research, statistics, machine learning, big data, and mathematics. In 100 pages, it covers many new topics, offering a fresh perspective on the subject. It is accessible to practitioners with a two-year college-level exposure to statistics and probability. The compact and tutorial style, featuring many applications (Blockchain, quantum algorithms, HPC, random number generation, cryptography, Fintech, web crawling, statistical testing) with numerous illustrations, is aimed at practitioners, researchers and executives in various quantitative fields.</span></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/2220289711?profile=original” target=”_self”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/2220289711?profile=original” width=”298″ class=”align-center”/></a></span></p>
<p><span>New ideas, advanced topics, and state-of-the-art research are discussed in simple English, without using jargon or arcane theory. It unifies topics that are usually part of different fields (data science, operations research, dynamical systems, computer science, number theory, probability) broadening the knowledge and interest of the reader in ways that are not found in any other book. This short book contains a large amount of condensed material that would typically be covered in 500 pages in traditional publications. Thanks to cross-references and redundancy, the chapters can be read independently, in random order.</span></p>
<p><span>This book is available for Data Science Central members exclusively. The text in blue consists of clickable links to provide the reader with additional references.  Source code and Excel spreadsheets summarizing computations, are also accessible as hyperlinks for easy copy-and-paste or replication purposes. The most recent version of this book is available <a href=”https://www.datasciencecentral.com/page/free-books-1″>from this link</a>, accessible to DSC members only. </span></p>
<p><span><strong>About the author</strong></span></p>
<p><span>Vincent Granville is a start-up entrepreneur, patent owner, author, investor, pioneering data scientist with 30 years of corporate experience in companies small and large (eBay, Microsoft, NBC, Wells Fargo, Visa, CNET) and a former VC-funded executive, with a strong academic and research background including Cambridge University.</span></p>
<div><a href=”https://storage.ning.com/topology/rest/1.0/file/get/2058338992?profile=original” target=”_self”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/2058338992?profile=original” width=”750″ class=”align-center”/></a></div>
<p><span><strong>Download the book (members only) </strong></span></p>
<p><span><a href=”https://www.datasciencecentral.com/page/free-books-1″ target=”_blank” rel=”noopener”>Click here</a> to get the book. For Data Science Central members only. </span><span>If you have any issues accessing the book please contact us at info@datasciencecentral.com.</span></p>
<div><a href=”https://storage.ning.com/topology/rest/1.0/file/get/2058338992?profile=original” target=”_self”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/2058338992?profile=original” width=”750″ class=”align-center”/></a></div>
<p><span><strong>Content</strong></span></p>
<p><span>The book covers the following topics:</span><span> </span></p>
<p><span><strong>1. Introduction to Stochastic Processes</strong></span></p>
<p><span>We introduce these processes, used routinely by Wall Street quants, with a simple approach consisting of re-scaling  random  walks to make them time-continuous, with a finite variance, based on the central limit theorem.</span></p>
<ul>
<li><span>Construction of Time-Continuous Stochastic Processes</span></li>
<li><span>From Random Walks to Brownian Motion</span></li>
<li><span>Stationarity, Ergodicity, Fractal Behavior</span></li>
<li><span>Memory-less or Markov Property</span></li>
<li><span>Non-Brownian Process</span></li>
</ul>
<p><span><strong>2. Integration, Differentiation, Moving Averages</strong></span></p>
<p><span>We introduce more advanced concepts about stochastic processes. Yet we make these concepts easy to understand even to the non-expert. This is a follow-up to Chapter 1.</span></p>
<ul>
<li><span>Integrated, Moving Average and Differential Process</span></li>
<li><span>Proper Re-scaling and Variance Computation</span></li>
<li><span>Application to Number Theory Problem</span></li>
</ul>
<p><span><strong>3. Self-Correcting Random Walks</strong></span></p>
<p><span>We investigate here a breed of stochastic processes that are different from the Brownian motion, yet are better models in many contexts, including Fintech.</span><span> </span></p>
<ul>
<li><span>Controlled or Constrained Random Walks</span></li>
<li><span>Link to Mixture Distributions and Clustering</span></li>
<li><span>First Glimpse of Stochastic Integral Equations</span></li>
<li><span>Link to Wiener Processes, Application to Fintech</span></li>
<li><span>Potential Areas for Research</span></li>
<li><span>Non-stochastic Case</span></li>
</ul>
<p><span><strong>4. Stochastic Processes and Tests of Randomness</strong></span></p>
<p><span>In this transition chapter, we introduce a different type of stochastic process, with number theory and cryptography applications, analyzing statistical properties of numeration systems along the way — a recurrent theme in the next chapters, offering many research opportunities and applications. While we are dealing with deterministic sequences here, they behave very much like stochastic processes, and are treated as such. Statistical testing is central to this chapter, introducing tests that will be also used in the last chapters.</span></p>
<ul>
<li><span>Gap Distribution in Pseudo-Random Digits</span></li>
<li><span>Statistical Testing and Geometric Distribution</span></li>
<li><span>Algorithm to Compute Gaps</span></li>
<li><span>Another Application to Number Theory Problem</span></li>
<li><span>Counter-Example: Failing the Gap Test</span></li>
</ul>
<p><span><strong>5. Hierarchical Processes</strong></span></p>
<p><span>We start discussing random number generation, and numerical and computational issues in simulations, applied to an original type of stochastic process. This will become a recurring theme in the next chapters, as it applies to many other processes.</span></p>
<ul>
<li><span>Graph Theory and Network Processes</span></li>
<li><span>The Six Degrees of Separation Problem</span></li>
<li><span>Programming Languages Failing to Produce Randomness in Simulations</span></li>
<li><span>How to Identify and Fix  the Previous Issue</span></li>
<li><span>Application to Web Crawling</span></li>
</ul>
<p><span><strong>6. Introduction to Chaotic Systems</strong></span></p>
<p><span>While typically studied in the context of dynamical systems, the logistic map can be viewed  as a stochastic process, with an equilibrium distribution and probabilistic properties, just like numeration systems (next chapters) and processes introduced in the first four chapters.</span></p>
<ul>
<li><span>Logistic Map and Fractals</span></li>
<li><span>Simulation: Flaws in Popular Random  Number  Generators</span></li>
<li><span>Quantum Algorithms</span></li>
</ul>
<p><span><strong>7. Chaos, Logistic Map and Related Processes</strong></span></p>
<p><span>We study processes related to the logistic map, including a special logistic map discussed here for the first time, with a simple equilibrium distribution. This chapter offers a transition between chapter 6, and the next chapters on numeration system (the logistic map being one of them.)</span></p>
<ul>
<li><span>General Framework</span></li>
<li><span>Equilibrium Distribution and Stochastic Integral Equation</span></li>
<li><span>Examples of Chaotic Sequences</span></li>
<li><span>Discrete, Continuous Sequences and Generalizations</span></li>
<li><span>Special Logistic Map</span></li>
<li><span>Auto-regressive Time Series</span></li>
<li><span>Literature</span></li>
<li><span>Source Code with Big Number Library</span></li>
<li><span>Solving the Stochastic Integral Equation: Example</span></li>
</ul>
<p><span><strong>8. Numerical and Computational Issues</strong></span></p>
<p><span>These issues have been mentioned in chapter 7, and also appear in chapters 9, 10 and 11. Here we take a deeper dive and offer solutions, using high precision computing with BigNumber libraries. </span></p>
<ul>
<li><span>Precision Issues when Simulating, Modeling, and Analyzing Chaotic Processes</span></li>
<li><span>When Precision Matters, and when it does not</span></li>
<li><span>High Precision Computing (HPC)</span></li>
<li><span>Benchmarking HPC Solutions</span></li>
<li><span>How to Assess the Accuracy of your Simulation Tool</span></li>
</ul>
<p><span><strong>9. Digits of Pi, Randomness, and Stochastic Processes</strong></span></p>
<p><span>Deep mathematical and data science research (including a result about the randomness of  Pi, which is just a particular case) are presented here, without using arcane terminology or complicated equations.  Numeration systems discussed here are a particular case of deterministic sequences behaving just like the stochastic process investigated earlier, in particular the logistic map, which is a particular case.</span></p>
<ul>
<li><span>Application: Random Number Generation</span></li>
<li><span>Chaotic Sequences Representing Numbers</span></li>
<li><span>Data Science and Mathematical Engineering</span></li>
<li><span>Numbers in Base 2, 10, 3/2 or Pi</span></li>
<li><span>Nested Square Roots and Logistic Map</span></li>
<li><span>About the Randomness of the Digits of Pi</span></li>
<li><span>The Digits of Pi are Randomly Distributed in the Logistic Map System</span></li>
<li><span>Paths to Proving Randomness in the Decimal System</span></li>
<li><span>Connection with Brownian Motions</span></li>
<li><span>Randomness and the Bad Seeds Paradox</span></li>
<li><span>Application to Cryptography, Financial Markets, Blockchain, and HPC</span></li>
<li><span>Digits of Pi in Base Pi</span></li>
</ul>
<p><span><strong>10. Numeration Systems in One Picture</strong></span></p>
<p><span>Here you will find a summary of much of the material previously covered on chaotic systems, in the context of numeration systems (in particular, chapters 7 and  9.)</span></p>
<ul>
<li><span>Summary Table: Equilibrium Distribution, Properties</span></li>
<li><span>Reverse-engineering Number Representation Systems</span></li>
<li><span>Application to Cryptography</span></li>
</ul>
<p><span><strong>11. Numeration Systems: More Statistical Tests and Applications</strong></span></p>
<p><span>In addition to featuring new research results and building on the previous chapters, the topics discussed here offer a great sandbox for data scientists and mathematicians.</span><span> </span></p>
<ul>
<li><span>Components of Number Representation Systems</span></li>
<li><span>General Properties of these Systems</span></li>
<li><span>Examples of Number Representation Systems</span></li>
<li><span>Examples of Patterns in Digits Distribution</span></li>
<li><span>Defects found in the Logistic Map System</span></li>
<li><span>Test of Uniformity</span></li>
<li><span>New Numeration System with no Bad Seed</span></li>
<li><span>Holes, Autocorrelations, and Entropy (Information Theory)</span></li>
<li><span>Towards a more General, Better, Hybrid System</span></li>
<li><span>Faulty Digits, Ergodicity, and High Precision Computing</span></li>
<li><span>Finding the Equilibrium Distribution with the Percentile Test</span></li>
<li><span>Central Limit Theorem, Random Walks, Brownian Motions, Stock Market Modeling</span></li>
<li><span>Data Set and Excel Computations</span></li>
</ul>
<p><span><strong>12. The Central Limit Theorem Revisited</strong></span></p>
<p><span>The central limit theorem explains the convergence of discrete stochastic processes to Brownian motions, and has been cited a few times in this book. Here we also explore a version that applies to deterministic sequences. Such sequences and treated as stochastic processes in this book.</span></p>
<ul>
<li><span>A Special Case of the Central Limit Theorem</span></li>
<li><span>Simulations, Testing, and Conclusions</span></li>
<li><span>Generalizations</span></li>
<li><span>Source Code</span></li>
</ul>
<p><span><strong>13. How to Detect if Numbers are Random or Not</strong></span></p>
<p><span>We explore here some deterministic sequences of numbers, behaving like stochastic processes or chaotic systems, together with another interesting application of the central limit theorem.</span></p>
<ul>
<li><span>Central Limit Theorem for Non-Random Variables</span></li>
<li><span>Testing Randomness: Max Gap, Auto-Correlations and More</span></li>
<li><span>Potential Research Areas</span></li>
<li><span>Generalization to Higher Dimensions</span></li>
</ul>
<p><span><strong>14. Arrival Time of Extreme Events in Time Series</strong></span></p>
<p><span>Time series, as discussed in the first chapters, are also stochastic processes. Here we discuss a topic rarely investigated in the literature: the arrival times, as opposed to the extreme values (a classic topic), associated with extreme events in time series.</span></p>
<ul>
<li><span>Simulations</span></li>
<li><span>Theoretical Distribution of Records over Time</span></li>
</ul>
<p><span><strong>15. Miscellaneous Topics</strong></span></p>
<p><span>We investigate topics related to time series as well as other popular stochastic processes such as spatial processes.</span></p>
<ul>
<li><span>How and Why: Decorrelate Time Series</span></li>
<li><span>A Weird Stochastic-Like, Chaotic Sequence</span></li>
<li><span>Stochastic Geometry, Spatial Processes, Random Circles: Coverage Problem</span></li>
<li><span>Additional Reading (Including Twin Points in Point Processes)</span></li>
</ul>
<p><span><strong>16. Exercises</strong></span></p>



Run Deep learning models for free using google colaboratory tag:www.analyticbridge.datasciencecentral.com,2018-10-01:2004291:BlogPost:388708
2018-10-01T15:07:17.000Z


suresh kumar Gorakala
https://www.analyticbridge.datasciencecentral.com/profile/sureshkumarGorakala

<h3>What is Google Colab:</h3>
<p><br></br><span>We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released…</span></p>


<h3>What is Google Colab:</h3>
<p><br/><span>We all know that deep learning algorithms improve the accuracy of AI applications to great extent. But this accuracy comes with requiring heavy computational processing units such as GPU for developing deep learning models. Many of the machine learning developers cannot afford GPU as they are very costly and find this as a roadblock for learning and developing Deep learning applications. To help the AI, machine learning developers Google has released a free cloud based service Google Colaboratory – Jupyter notebook environment with free GPU processing capabilities with no strings attached for using this service. It is a ready to use service which requires no set at all. </span><br/><br/><span>Any AI developers can use this free service to develop deep learning applications using popular AI libraries like Tensorflow, Pytorch, Keras, etc.</span></p>
<p></p>
<h3>Setting up colab:</h3>
<p><br/><i>Go to google drive → new item → More → colaboratory<span> </span></i><br/><br/></p>
<div class=”separator”><a href=”https://1.bp.blogspot.com/-S6WyVM9ZEfQ/W5QLeuTPwaI/AAAAAAAAGUA/geEtAT20lNUpH5ruI4wUqL4KPlh36WBbgCLcBGAs/s1600/google%2Bcolab%2Bintro.png”><img border=”0″ height=”188″ src=”https://1.bp.blogspot.com/-S6WyVM9ZEfQ/W5QLeuTPwaI/AAAAAAAAGUA/geEtAT20lNUpH5ruI4wUqL4KPlh36WBbgCLcBGAs/s320/google%2Bcolab%2Bintro.png” width=”320″ class=”align-center”/></a></div>
<p><br/><span>This opens up a python Jupyter notebook in browser.</span><br/><br/></p>
<div class=”separator”><a href=”https://2.bp.blogspot.com/-x4vpgKvfLio/W5QLjzuZd8I/AAAAAAAAGUE/XedcX87NjQ0bH-j7afeIp3PX3XGxHnHEgCLcBGAs/s1600/colab%2Bpython%2Bnotebook.png”><img border=”0″ height=”75″ src=”https://2.bp.blogspot.com/-x4vpgKvfLio/W5QLjzuZd8I/AAAAAAAAGUE/XedcX87NjQ0bH-j7afeIp3PX3XGxHnHEgCLcBGAs/s320/colab%2Bpython%2Bnotebook.png” width=”320″ class=”align-center”/></a></div>
<p><br/><span>By default, the Jupyter notebook runs on Python 2.7 version and CPU processor. We may change the python version to Python 3.6 and processing capability to GPU by changing the settings as shown below:</span><br/><br/><i>Go to Runtime → Change runtime type<span> </span></i><br/><br/></p>
<div class=”separator”><a href=”https://3.bp.blogspot.com/-VJzJTIqn7vE/W5QLoz79ZFI/AAAAAAAAGUI/w_Ead9jgisgZtqZxzWlwCPcJ9taPMjWcgCLcBGAs/s1600/colab%2Bgpu%2Bruntime.png”><img border=”0″ height=”192″ src=”https://3.bp.blogspot.com/-VJzJTIqn7vE/W5QLoz79ZFI/AAAAAAAAGUI/w_Ead9jgisgZtqZxzWlwCPcJ9taPMjWcgCLcBGAs/s320/colab%2Bgpu%2Bruntime.png” width=”320″ class=”align-center”/></a></div>
<p><br/><span>This opens up a Notebook settings pop-up where we can change Runtime Type to Python 3.6 and processing Hardware to GPU.</span><br/><br/></p>
<div class=”separator”><a href=”https://1.bp.blogspot.com/-YfE2Rc19ouU/W5QLuDr0giI/AAAAAAAAGUM/k-pRP6V1BLEqOYDGm4N7nWdf-46aVicJACLcBGAs/s1600/python%2Bcolab%2Bgpu.png”><img border=”0″ height=”128″ src=”https://1.bp.blogspot.com/-YfE2Rc19ouU/W5QLuDr0giI/AAAAAAAAGUM/k-pRP6V1BLEqOYDGm4N7nWdf-46aVicJACLcBGAs/s320/python%2Bcolab%2Bgpu.png” width=”320″ class=”align-center”/></a></div>
<p><br/><span>Bingo, your python environment with the processing power of GPU is ready use.</span><br/><br/><b>Important things to remember:</b><span> </span></p>
<ul>
<li>The supported browsers are Chrome and Firefox</li>
<li>Currently only Python is supported</li>
<li>We can you use upto 12 hours of processing time in one go</li>
</ul>
<p><span>Let’s check if our newly created Jupyter notebook works perfectly. Run below commands and see if we are getting expected results. </span><br/><br/></p>
<div class=”separator”><a href=”https://3.bp.blogspot.com/-A7P6JXA656k/W5QL0FW427I/AAAAAAAAGUY/iHX8814IkbkC-L91S8kTPrwLjWkuh7HQQCLcBGAs/s1600/python%2Bcolab%2Bgoogle%2Bai.png”><img border=”0″ height=”115″ src=”https://3.bp.blogspot.com/-A7P6JXA656k/W5QL0FW427I/AAAAAAAAGUY/iHX8814IkbkC-L91S8kTPrwLjWkuh7HQQCLcBGAs/s320/python%2Bcolab%2Bgoogle%2Bai.png” width=”320″ class=”align-center”/></a></div>
<p><br/><span>By default most frequently used python libraries such as Numpy, Pandas, scipy, Sklearn, Matplotlib etc are pre-installed when we create a notebook. Below we can see plotting </span></p>
<div class=”separator”><a href=”https://3.bp.blogspot.com/-IOBqLxS9lyQ/W5QL7FO9ODI/AAAAAAAAGUg/_BqQ-wfrgiw1J7Uc1PXrqpa8UG7dC6HVQCLcBGAs/s1600/google%2Bpython%2Bcolab%2Bai.png”><img border=”0″ height=”137″ src=”https://3.bp.blogspot.com/-IOBqLxS9lyQ/W5QL7FO9ODI/AAAAAAAAGUg/_BqQ-wfrgiw1J7Uc1PXrqpa8UG7dC6HVQCLcBGAs/s320/google%2Bpython%2Bcolab%2Bai.png” width=”320″ class=”align-center”/></a></div>
<div class=”separator”></div>
<div class=”separator”><h3>For Running Machine Learning example, <a href=”http://www.dataperspective.info/2018/09/getting-started-with-google-laboratory-deep-learning.html” target=”_blank” rel=”noopener”>find here</a></h3>
</div>



Who cares if unsupervised machine learning is supervised learning in disguise? tag:www.analyticbridge.datasciencecentral.com,2018-09-23:2004291:BlogPost:388689
2018-09-23T19:34:28.000Z


Danko Nikolic
https://www.analyticbridge.datasciencecentral.com/profile/DankoNikolic

<p><span>Previously, we saw how unsupervised learning actually <a href=”https://www.analyticbridge.datasciencecentral.com/profiles/blogs/supervised-learning-in-disguise-the-truth-about-unsupervised” rel=”noopener” target=”_blank”>has built-in supervision</a>, albeit hidden from the user.</span></p>
<p><span>In this post we will see how supervised and unsupervised learning algorithms share more in common than the textbooks would suggest. As a matter of fact, both classes can use identical…</span></p>


<p><span>Previously, we saw how unsupervised learning actually <a href=”https://www.analyticbridge.datasciencecentral.com/profiles/blogs/supervised-learning-in-disguise-the-truth-about-unsupervised” target=”_blank” rel=”noopener”>has built-in supervision</a>, albeit hidden from the user.</span></p>
<p><span>In this post we will see how supervised and unsupervised learning algorithms share more in common than the textbooks would suggest. As a matter of fact, both classes can use identical equations for creating mathematical models of the data, and both can use identical learning algorithms to find optimal parameter values for those models.</span></p>
<p><span>The consequence of this relation is that one can easily transform a supervised learning method into an unsupervised one, and vice versa. The only change you need to do is determine how Y will be computed; that is, you have to decide how your error for learning (training) will be defined.</span></p>
<p><span>You may have not noticed so far, but the general linear model (GLM) has been used as a versatile model with a versatile set of learning methods in order to create various supervised and unsupervised learning methods.</span></p>
<p><span>When one thinks of GLM, probably the first methods that come to mind are regression and inferential statistics (e.g., ANOVA), both of which fall into the category of supervised learning. However, GLM has been used just as extensively in unsupervised setups. This relates to dimensionality reduction techniques in which the algorithm is not being told with which dimensions particular data points are being saturated. Rather, the algorithm is left to “discover” on its own those dimensions. <a href=”https://en.wikipedia.org/wiki/Principal_component_analysis”>Principal component analysis (PCA)</a> and various forms of <a href=”https://en.wikipedia.org/wiki/Factor_analysis”>factor analyses</a> are all examples of unsupervised applications of GLM.</span></p>
<p><span>This easy jump from supervised to unsupervised is not just a property of simple models such as GLM. Exactly the same applies to computationally elaborate methods such as <a href=”https://en.wikipedia.org/wiki/Deep_learning”>deep learning neural networks</a>. A neural network can be easily set to operate with supervision or unsupervised; most commonly known ones are supervised applications, such as image recognition in which humans initially provided labels about the categories to which each image belongs. The network then learns that assignment, and if everything is done right, is capable of correctly classifying new images representing those trained categories (e.g., distinguishing human faces from houses; from tools; etc.).</span></p>
<p><span>Neural networks can be used just as efficiently in an unsupervised learning setup. Perhaps the most common examples are auto-encoders, which are capable of detecting anomalies in data. Here, the network is trained to produce an output that has exactly the same values as the inputs it receives. The difference between what it has generated and what it should have generated i.e., the error, is used for adjusting its synaptic weights. The training continues until the network can do the job satisfactorily well using data that have not been used for training (i.e., test data set).</span></p>
<p><span>What makes this learning non-trivial is that the topology of the neural network is made such that at least one of the hidden layers has a smaller number of units than the number of units in the input (and output) layer(s). This forces the network to find a representation of the data with reduced dimensionality, similar to that performed by PCA and factor analyses.</span></p>
<p><span>Such networks are useful for applications in which labels possibly do not exist, or would be impractically difficult to obtain. Also, they can be very useful for applications in which collection of labels may take years, such as for example, <a href=”https://en.wikipedia.org/wiki/Fraud#Detection”>fraud detection</a> and <a href=”https://en.wikipedia.org/wiki/Predictive_maintenance”>predictive maintenance</a>.</span></p>
<p><span>A piece of advice to data scientists: don’t be afraid to turn your supervised learning method into an unsupervised one or vice versa, if you see that this fits your problem. You will need some creative thinking and more coding than usual but as a result, you may end up with exactly the solution that the task you are solving requires.</span></p>
<p><span>Here is one general rule to keep in mind: supervised learning methods will always be capable of solving a wider range of different real-life problems than unsupervised ones. This is because supervised ones are much more specialized: their error computation is already determined by the algorithm. In addition, error computation is limited to whatever can be extracted from the input data. In contrast, unsupervised methods, being open to error data coming from the outside world, can basically take advantage of the errors “computed” by the entire external universe – including the physical events underlying the actual phenomenon that these methods are trying to model (e.g., a real physical event of a machine becoming broken provides the training information for a predictive model of whether a machine will soon be broken).</span></p>
<p><span>All other things being equal, supervised methods will require less data and computational power to achieve a similar result. Unsupervised algorithms can learn to classify objects, as for example <a href=”https://arxiv.org/pdf/1112.6209v5.pdf”>cats</a>. But this comes with the expense of a lot more resources than needed for a supervised equivalent. In case of Google’s algorithm that discovered cats in images, 10 million images were required, 1 billion connections, 16,000 computer cores, three days of computation and a team of eight scientists from Google and Stanford. That’s a lot of resources.</span></p>
<p><span>In conclusion, we now know the terms ‘supervised’ and ‘unsupervised’ may be misleading, as there is quite a bit of supervision in unsupervised learning. Maybe a better analogy would be if supervised learning was referred to as ‘micro-managed learning’, and instead of unsupervised learning we used the term ‘macro-managed learning’. These two would probably better describe what is actually happening in the background of the respective algorithms.</span></p>
<p><span>Knowing that supervised and unsupervised methods can be seen as two different applications of the same general set of tools can be quite useful for creative problem solving in data science. By assuming a bit of an inventive attitude, one can relatively effortlessly convert an existing method from one form to another, as circumstances require.</span></p>



Graph-based intelligence analysis tag:www.analyticbridge.datasciencecentral.com,2018-08-13:2004291:BlogPost:387314
2018-08-13T11:30:00.000Z


Elise Devaux
https://www.analyticbridge.datasciencecentral.com/profile/EliseDevaux

<p><span>For decades, the intelligence community has been collecting and analyzing information to produce timely and actionable insights for intelligence consumers. But as the amount of information collected increases, analysts are facing new challenges in terms of data processing and analysis. In this article, we explore the possibilities that graph technology is offering for intelligence analysis.</span></p>
<h2><span style=”font-size: 18pt;”>Intelligence collection and analysis in the age of…</span></h2>


<p><span>For decades, the intelligence community has been collecting and analyzing information to produce timely and actionable insights for intelligence consumers. But as the amount of information collected increases, analysts are facing new challenges in terms of data processing and analysis. In this article, we explore the possibilities that graph technology is offering for intelligence analysis.</span></p>
<h2><span style=”font-size: 18pt;”>Intelligence collection and analysis in the age of big data</span></h2>
<p><span>The digital age brought new possibilities for Intelligence, Surveillance, and Reconnaissance across both traditional and new intelligence sources. The possibilities of collection within each discipline have widened. For instance, the <a href=”https://en.wikipedia.org/wiki/Open-source_intelligence”>Open Source Intelligence (OSINT) collection channels</a> multiplied with the Internet, providing accesses to new valuable information. The generalization of digital technologies also extended the production, and thus the collection possibilities, with users generating and sharing content from portable devices anywhere in the world.</span></p>
<div id=”attachment_6592″ class=”wp-caption aligncenter”></div>
<div class=”wp-caption aligncenter”><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220285977?profile=original” target=”_self”><img width=”650″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220285977?profile=RESIZE_1024x1024″ width=”650″ class=”align-center”/></a><div id=”attachment_6592″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”><em>Various OSINT information sources</em></p>
</div>
<p></p>
</div>
<p><span>But those changes come at a cost for analysts:</span></p>
<ol>
<li><span>More data requires </span><b>additional processing efforts</b><span> because unprocessed data is unexploitable and thus worthless. And new data is constantly being collected, requiring regular updates and processing.</span></li>
<li><span>Once processed, those data collections are often</span><b><span> </span>sizable</b><span>. This complicates and slows down the identification of the relevant piece of data, making it even impossible when the tagging or indexing is not done perfectly.</span></li>
<li><span>Thirdly, </span><b>data is disparate and scattered<span> </span></b><span>across silos</span><b>.</b><span> It comes in a wide variety of formats and types that are handled differently. This diversity turns multi-intelligence, or all-sources analysis, into a tedious task: how to draw a connection between two, ten, or hundred, data pieces when they are present in different tools?</span></li>
</ol>
<p><span>This has a direct impact on the analysis. It’s difficult and time-consuming to handle those large, dynamics and various data assets. And in the meantime, the complexity of threats remains the same. To identify them, analysts must be able to cross-check various data assets in order to spot key elements and patterns that will produce actionable intelligence.</span></p>
<h2><span style=”font-size: 18pt;”>Graph technology to support intelligence analysis</span></h2>
<p><span>To renew and improve the traditional intelligence cycle, intelligence producers are turning to new tools and methods. Among those tools, we find graph technology. The underlying approach allows analysts to rapidly access relevant data and sift through large heterogeneous collections to find the small subset that holds high-value information.</span></p>
<p><span>The graph technology approach relies on a model in which you deal with data as a network. Information is stored as nodes, connected to each other by edges representing their relationships. This is actually a natural way to think about intelligence data: whether it’s people, telecommunication or events, the elements often form networks in which they are linked to each other.   </span></p>
<h3><span>Graph database: gathering your data in a single model</span></h3>
<p><span>Graph or RDF databases are optimized for the storage of connected data. It emerged as the answer to the limitations of traditional databases. The relational databases were designed to codify and store tabular structures. While they are very good at it, they do not perform well when it comes to handling large volumes of connected data. Graph databases, on the other hand, offer several advantages over traditional technology when it comes to connected data:</span></p>
<ul>
<li><span><strong>Performance</strong>: these systems are designed to handle data relationships and they greatly improve the performances when it comes to querying data connections.</span></li>
<li><span><strong>Flexibility</strong>:  graph databases easily accommodate with rapidly scaling data. You can enrich or change the data architecture as the organization requirements evolve.</span></li>
</ul>
<p><span>Popular graph storage vendors include </span><span><a href=”https://linkurio.us/solution/datastax/”>DataStax</a>, <a href=”https://linkurio.us/solution/janusgraph/”>JanusGraph</a>, <a href=”https://linkurio.us/solution/neo4j/”>Neo4j </a>or <a href=”https://linkurio.us/solution/stardog/”>Stardog</a></span><span>. These systems </span><a href=”https://db-engines.com/en/ranking_trend/graph+dbms”><span>widely developed</span></a><span> in the last decade, responding to the growing need for a technical solution for organizations working with connected data at scale.  </span></p>
<p><span>With graph technology, you can combine multi-dimensional data, including time series, demographic or geographic data. It aggregates data from multiple sources and formats into a single, comprehensive data model that can scale up to billions of nodes and edges.</span></p>
<p><span>This is essential in multi-intelligence or all-source analysis to identify suspicious patterns, anomalies or irregular behavior. Indeed, suspicious activities are more easily detected when you analyze the dynamics between entities and not just the characteristics of single entities. With this approach, analysts easily gather and analyze data about people, events, and locations for example, into one view.</span></p>
<div id=”attachment_6593″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”><em><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220286314?profile=original” target=”_self”><img width=”650″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220286314?profile=RESIZE_1024x1024″ width=”650″ class=”align-full”/></a></em></p>
<p class=”wp-caption-text” style=”text-align: center;”><em>Example of a people, object, location, and event (POLE) graph model.<br/> <br/></em></p>
</div>
<p><span>In the end, graph technology offers several advantages to intelligence and law enforcement agencies. It provides a single entry point to multiple data sources and data types that are integrated under a unique model. Analysts can produce intelligence from the analysis of heterogeneous data </span><i><span>and </span></i><span>its connections.</span></p>
<p><span>Introducing graph databases into an organization comes with a set of new challenges. How to let analysts access the data in a suitable way? How to enable them to </span><span>find information hidden in a complex web of billions of nodes and relationships? </span><span>That’s where graph visualization and analysis tools come in handy.</span></p>
<h3><span style=”font-size: 14pt;”>Graph visualization and analysis platform</span></h3>
<p><span>While the graph approach offers a unified model, finding insights into the enormous volume of data remains a challenge for analysts. To this, must be added the pressure of intelligence consumers that expect analysts to deliver intelligence insights in a timely manner.  </span></p>
<p><span>As we previously explained, </span><a href=”https://linkurio.us/blog/why-graph-visualization-matters/”><span>visualization tools can be a great asset for investigation</span></a><span>.</span></p>
<ul>
<li><span>They are <strong>code-free</strong>;</span></li>
<li><span>They improve your <strong>efficiency</strong> because our brain understands images faster than it understands texts or figures;</span></li>
<li><span>These tools also increase the chances to <strong>discover insights</strong>.</span></li>
</ul>
<p><span>When you work with connected data, graph visualization and analysis is definitely a more efficient method than the traditional analysis of spreadsheets or data stored in relational databases.</span></p>
<p><span>In addition, graph analysis offers a valuable set of methods to get insights from connected data. For example, there are many algorithms, derived from graph theory and <a href=”https://en.wikipedia.org/wiki/Social_network_analysis”>social network analysis</a>, that can be used to identify communities, to spot highly connected individuals or to understand flows of information through a network.</span></p>
<p><span>Graph investigation tools, such as <a href=”https://linkurio.us/product/”>Linkurious Enterprise</a>, are an additional asset for intelligence analysts facing the challenges of big data. These tools are designed to enable analysts to uncover insights hidden in complex datasets by leveraging the power of graph databases They also provide more agility than in-house tools or complex proprietary platforms, such as i2 or Palantir.</span></p>
<p><span>When it comes to threat detection and investigation, graph investigation tools reduce the complexity and noise induced by the nature and volume of the processed data.</span></p>
<h2><span style=”font-size: 18pt;”>Gathering intelligence with Linkurious Enterprise</span></h2>
<p><span>In Linkurious Enterprise, a complex data domain with different data sources or multiple entity types becomes a single, comprehensive graph. Analysts can visually investigate vast data collections. They can search for known patterns and suspicious links from a browser-based interface. <a href=”https://linkurio.us/blog/linkurious-2-5-filter-style-new-graphics-engine/”>Data filters and visual styles</a> help them focus on what’s important and reduce the noise generated by large amounts of data.</span></p>
<p><span>Below, we used OSINT data to showcase some of the visualization and analysis capabilities of Linkurious Enterprise. We used a publicly available dataset, the </span><a href=”http://www.start.umd.edu/gtd/terms-of-use/CitingGTD.aspx”><span>Global Terrorism Database</span></a><span>. We modeled part of the data (from 2013 to 2016) into a graph database following a simple graph model.</span></p>
<div id=”attachment_6591″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”><em><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220286339?profile=original” target=”_self”><img width=”650″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220286339?profile=RESIZE_1024x1024″ width=”650″ class=”align-center”/></a></em></p>
<p class=”wp-caption-text” style=”text-align: center;”><em>Graph model used for the Global Terrorism Database data.<br/> <br/></em></p>
</div>
<p><span>The data was then ingested into a graph database using a </span><a href=”https://gist.github.com/elisedeux/23954c8932ccfe94be7126a240688a82″><span>script</span></a><span> (there are a <a href=”http://blog.bruggen.com/2015/09/part-13-experimenting-with-pole-global.html”>few different options </a></span><a href=”http://blog.bruggen.com/2015/09/part-13-experimenting-with-pole-global.html”><span>to model</span></a><span>and import the data). Our database contains about 90,000 nodes and 240,000 relationships that are now all available for investigation in Linkurious Enterprise.</span></p>
<h3><span style=”font-size: 14pt;”><strong>Data investigation</strong></span></h3>
<p><span>Analysts can use full-text search capacities to look for specific information in the database. With a few clicks, it’s possible to visualize all the terrorist attacks that happened in France between 2013 and 2016. The brown and green nodes respectively present the province of France and their cities. Each blue node represents a terrorist act recorded in a city. When the authors are known, they are symbolized by a yellow node, linked to the events.</span></p>
<div id=”attachment_6590″ class=”wp-caption aligncenter”><br/><p class=”wp-caption-text” style=”text-align: center;”><em><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220286823?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220286823?profile=RESIZE_1024x1024″ width=”750″ class=”align-center”/></a></em></p>
<p class=”wp-caption-text” style=”text-align: center;”><em>Visualization of events recorded in France between 2013 and 2017, their locations and the authors.</em></p>
</div>
<p> </p>
<p> </p>
<table>
<tbody><tr><td><span><a href=”https://linkurio.us/wp-content/uploads/2018/08/france_1.png” class=”fancybox image” rel=”prettyPhoto” title=””><img class=”lazy aligncenter wp-image-6586 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/08/france_1.png” alt=”Screenshot from Linkurious Enterprise” width=”150″ height=”112″/></a></span><i><span>1: Terrorist activities recorded in the Ile de France province.<br/> [Click to enlarge]</span></i></td>
<td><i><span><a href=”https://linkurio.us/wp-content/uploads/2018/08/France_2.png” class=”fancybox image” rel=”prettyPhoto” title=””><img class=”lazy aligncenter wp-image-6587 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/08/France_2.png” alt=”Screenshot from Linkurious Enterprise” width=”150″ height=”110″/></a><br/> 2: Terrorist activities perpetrated by Salafi jihadist groups in France. [Click to enlarge}</span></i></td>
<td><span><a href=”https://linkurio.us/wp-content/uploads/2018/08/france_3.png” class=”fancybox image” rel=”prettyPhoto” title=””><img class=”lazy aligncenter wp-image-6588 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/08/france_3.png” alt=”Screenshot from Linkurious Enterprise” width=”150″ height=”168″/></a></span><i><span>3: Terrorist activities recorded in Corsica. [Click to enlarge]</span></i></td>
<td><span><a href=”https://linkurio.us/wp-content/uploads/2018/08/france_4.png” class=”fancybox image” rel=”prettyPhoto” title=””><img class=”lazy aligncenter wp-image-6589 lazy-loaded” src=”https://linkurio.us/wp-content/uploads/2018/08/france_4.png” alt=”Screenshot from Linkurious Enterprise” width=”150″ height=”128″/></a></span><i><span>4: Terrorist activities recorded in southern France regions. [Click to enlarge]</span></i></td>
</tr>
</tbody>
</table>
<p><span><br/> The underlying graph structure gives us a better understanding of the events. The connections and the different categories of data (events, locations, people) provide some contextual information helpful for the analysis. For instance, by looking at the node clusters and the relationships, we can identify that:</span></p>
<ul>
<li><span>Terrorist activities have been important in France;</span></li>
<li><span>Ile-de-France has been a major target for terrorism since 2015;</span></li>
<li><span>The activity of Salafi-jihadist groups (ISIS, ISIL, Al Qaeda, etc.) has been important, especially in the Paris area;  </span></li>
<li><span>The region of Corsica is largely affected by terrorist acts;</span></li>
<li><span>Some specific southern regions are major places of incidents.</span></li>
</ul>
<p><span>From this quickly generated visualization, we are able to identify the main terrorist trends in France (rise of Islamic terrorism, local conflicts, and nationalist movements). In a real-life scenario, professional intelligence analysts can provide accurate reports based on the analysis of conflict and terrorist data.</span></p>
<h3><span style=”font-size: 14pt;”><strong>Geospatial visualization</strong></span></h3>
<p><span>Graph models support the aggregation of heterogeneous data, so it’s possible to enrich our OSINT data with geospatial information. In our example, every “Event” node carries geolocation properties, allowing us to display them on a map within Linkurious Enterprise. Below is an example of a geospatial</span><span> visualization, with events represent as red nodes on the map, representing a month of terrorist activity in 2014. </span></p>
<p style=”text-align: center;”><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220289973?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220289973?profile=RESIZE_1024x1024″ width=”750″ class=”align-center”/></a></span></p>
<div id=”attachment_6585″ class=”wp-caption aligncenter”><p class=”wp-caption-text” style=”text-align: center;”><em>A geo-visualization of the 1731 events reported in July 2014.<br/> <br/></em></p>
</div>
<p><span>Within intelligence teams, this feature is used to track down a series of events happening in a region in a short time-frame. A cluster of kinetic events is a known pattern for a trained analyst than can identify correlations and underlying terrorist tactics.</span></p>
<h3><span style=”font-size: 14pt;”><strong>Pattern detection</strong></span></h3>
<p><span>The advantage of graph databases is that they allow you to quickly traverse a high number of entities and relationships to retrieve information. This is a big change from systems based on relational databases in which querying connections are compute and memory-intensive operations that have an exponential cost. With Linkurious Enterprise, it’s possible to leverage the power of graph databases to search for a specific scenario, such as “<em>are these two seemingly unconnected terrorist groups connected, and if so how? ”</em>. For intelligence analysts, this can help identify key individuals, correlate a series of events with people, or understand the dynamics at work within an organization. Combined with their knowledge and experience, detecting pattern in connected data is an additional asset to conduct intelligence analysis.</span></p>
<p><span>Below is an example of a visualization generated with a graph query that matches the world’s ten deadliest attacks since 2013 and their connections to groups, city, and locations.</span></p>
<p></p>
<div id=”attachment_6584″ class=”wp-caption aligncenter” style=”text-align: center;”><br/><p class=”wp-caption-text” style=”text-align: center;”><em><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220290530?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220290530?profile=RESIZE_1024x1024″ width=”750″ class=”align-center”/></a>A graph-visualization of the ten attacks with the highest number of victims and their connections to location, authors, and targets.<br/> <br/></em></p>
</div>
<p><span>In Linkurious Enterprise, pattern detection can be automated as alerts. This reduces the analysts’ workload, with the platform automatically monitoring large volumes of data to uncover hidden connections and complex patterns.</span></p>
<p></p>
<h3><span style=”font-size: 14pt;”>Going further<br/> <br/></span></h3>
<p><span>In our examples, we created our database in a limited time, from a single data source. However, it is possible to add data from additional sources to enrich the database, depending on the questions you want to answer. For instance, you could add data from phone interceptions or financial transactions to identify potential relationships between attacks.</span></p>
<p>In addition to what we just saw, analysts can use advanced graph analysis, a set of methods expressly designed to find insights in connected data. There are for instance many algorithms, derived from graph theory, that can be used to identify communities, to spot people who occupy a key position in a network or to understand how information, money, or people flow through a network.</p>
<p><span>In the end, graph technology enables intelligence analysts to tackle the changes induced by the big data era. It’s an asset in the processing, storage, and analysis of the complex data collected today. While graph databases are great to aggregate and connect a multitude of sources in one place, Linkurious Enterprise helps teams of analysts easily find hidden intelligence within large graphs. It highlights connections in the data, allowing analysts to better understand and analyze complex situations. At the end of the day, analysts can better exploit their data to generate high-value insights.<br/> <br/> <a href=”https://linkurio.us/” target=”_blank” rel=”noopener”>Learn more here</a><br/></span></p>



Invitation to Join Data Science Central tag:www.analyticbridge.datasciencecentral.com,2018-09-08:2004291:BlogPost:388034
2018-09-08T17:14:58.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p><span>Join the largest community of machine learning (ML), deep learning, AI, data science, business analytics, BI, operations research, mathematical and statistical professionals: <a href=”https://www.datasciencecentral.com/main/authorization/signUp?” target=”_self”>Sign up here</a>. If instead, you are only interested in receiving our newsletter, you can subscribe <a href=”https://www.datasciencecentral.com/page/newsletter” rel=”noopener” target=”_blank”>here</a>. There is no…</span></p>


<p><span>Join the largest community of machine learning (ML), deep learning, AI, data science, business analytics, BI, operations research, mathematical and statistical professionals: <a href=”https://www.datasciencecentral.com/main/authorization/signUp?” target=”_self”>Sign up here</a>. If instead, you are only interested in receiving our newsletter, you can subscribe <a href=”https://www.datasciencecentral.com/page/newsletter” target=”_blank” rel=”noopener”>here</a>. There is no cost.</span></p>
<p><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/125265117?profile=original” target=”_self”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/125265117?profile=original” width=”400″ class=”align-center”/></a></span></p>
<p><span class=”font-size-3″>The full membership includes, in addition to the newsletter subscription:</span></p>
<ul>
<li><span>Access to <a href=”https://www.datasciencecentral.com/page/member” target=”_blank” rel=”noopener”>member-only pages</a>, our free data science eBooks, data sets, code snippets, and solutions to data science / machine learning / mathematical challenges.</span></li>
<li><span class=”font-size-3″>Support to all your questions regarding our community.</span></li>
<li><span class=”font-size-3″>Data sets, projects, cheat sheets, tutorials, programming tips, summarized information easy to digest, DSC webinars, data science events (conferences, workshops), new books, and news. </span></li>
<li><span class=”font-size-3″>Ability to post <a href=”https://www.datasciencecentral.com/profiles/blog/list?promoted=1″ target=”_blank” rel=”noopener”>blogs</a> and <a href=”https://www.datasciencecentral.com/forum/topic/featured” target=”_blank” rel=”noopener”>forum questions</a>, as well as comments, and get answers from experts in their field. </span></li>
</ul>
<p><span class=”font-size-3″>You can easily unsubscribe at any time. Our weekly digest features selected discussions, articles written by experts, forum questions and announcements aimed at machine learning, AI,  IoT, analytics, data science, BI, operations research and big data practitioners.</span></p>
<p><span class=”font-size-3″>It covers topics such as deep learning, AI, blockchain, visualization, automated machine learning, Hadoop, data integration and engineering, statistical science, computational statistics, analytics, pure data science, data security, and even computer-intensive methods in number theory. It includes</span></p>
<ul>
<li><span class=”font-size-3″>Exclusive content for subscribers only: our upcoming book on automated data science (coming soon), detailed research reports about the data science community (for instance, best cities for data scientists, with growth trends), API’s (top Twitter accounts, various forecasting apps) and more</span></li>
<li><span class=”font-size-3″>New book and new journal announcements</span></li>
<li><span class=”font-size-3″>Salary surveys  – how much a Facebook data scientist makes</span></li>
<li><span class=”font-size-3″>Workshops, webinars and conference announcements </span></li>
<li><span class=”font-size-3″>Programs and certifications for data scientists</span></li>
<li><span class=”font-size-3″>Case studies, success stories, benchmarks</span></li>
<li><span class=”font-size-3″>New analytic companies/products announcements</span></li>
<li><span class=”font-size-3″>Sample source code, questions about coding and algorithms</span></li>
</ul>
<p><span class=”font-size-3″><a href=”https://storage.ning.com/topology/rest/1.0/file/get/125265129?profile=original” target=”_self”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/125265129?profile=original” width=”713″ class=”align-center”/></a></span></p>
<p><span class=”font-size-3″><strong><a href=”https://www.datasciencecentral.com/main/authorization/signUp?” target=”_self”>Click here to sign up</a></strong> and start receiving our newsletter. We respect your privacy: member information (email address etc.) is kept confidential and never shared.</span></p>



Type I and Type II Errors in One Picture tag:www.analyticbridge.datasciencecentral.com,2017-08-10:2004291:BlogPost:369586
2017-08-10T23:17:32.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>This picture speaks more than words. It explains the concept or false positive and false negative, that is, what is referred to by statisticians as Type I and Type II errors.</p>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220282767?profile=original” target=”_self”><img class=”align-center” src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282767?profile=original” width=”472″></img></a></p>
<p>Other great pictures summarizing data science and statistical concepts, can be found…</p>


<p>This picture speaks more than words. It explains the concept or false positive and false negative, that is, what is referred to by statisticians as Type I and Type II errors.</p>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220282767?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282767?profile=original” width=”472″ class=”align-center”/></a></p>
<p>Other great pictures summarizing data science and statistical concepts, can be found <a href=”http://www.datasciencecentral.com/profiles/blogs/four-great-pictures-illustrating-machine-learning-concepts” target=”_blank”>here</a> and also <a href=”http://www.datasciencecentral.com/profiles/blogs/17-amazing-infographics-and-other-visual-tutorials” target=”_blank”>here</a>. </p>
<p><b>DSC Resources</b></p>
<ul>
<li>Services: <a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a> | <a href=”http://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a> | <a href=”http://classifieds.datasciencecentral.com/”>Classifieds</a> | <a href=”http://www.analytictalent.com/”>Find a Job</a></li>
<li>Contributors: <a href=”http://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a> | <a href=”http://www.datasciencecentral.com/forum/topic/new”>Ask a Question</a></li>
<li>Follow us: <a href=”http://www.twitter.com/datasciencectrl”>@DataScienceCtrl</a> | <a href=”http://www.twitter.com/analyticbridge”>@AnalyticBridge</a></li>
</ul>
<p>Popular Articles</p>
<ul>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/20-articles-about-core-data-science”>What is Data Science? 24 Fundamental Articles Answering This Question</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/hitchhiker-s-guide-to-data-science-machine-learning-r-python”>Hitchhiker’s Guide to Data Science, Machine Learning, R, Python</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
</ul>



Linear Models Don’t have to Fit Exactly for P-Values To Be Accurate, Right, and Useful tag:www.analyticbridge.datasciencecentral.com,2017-11-03:2004291:BlogPost:374045
2017-11-03T05:30:00.000Z


Chirag Shivalker
https://www.analyticbridge.datasciencecentral.com/profile/ChiragShivalker

<p>There is no need to get confused with multiple linear regression, generalized linear model or general linear methods. The general linear model or multivariate regression model is a statistical linear model and is written as <strong>Y = XB + U</strong>.</p>
<p><br></br> <img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282286?profile=RESIZE_1024x1024″ width=”750″></img></p>
<p><br></br> Usually, a linear model includes a number of different statistical models such as ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The GLM is a generalization of multiple…</p>


<p>There is no need to get confused with multiple linear regression, generalized linear model or general linear methods. The general linear model or multivariate regression model is a statistical linear model and is written as <strong>Y = XB + U</strong>.</p>
<p><br/> <img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282286?profile=RESIZE_1024x1024″ width=”750″/></p>
<p><br/> Usually, a linear model includes a number of different statistical models such as ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, t-test and F-test. The GLM is a generalization of multiple linear regression models to the case of more than one dependent variable. So if Y, B, and U represent column vectors, the matrix equation above will portray a multiple linear regression.<br/> <br/> <span class=”font-size-4″>Which are the key assumptions made in a multiple linear regression analysis?</span><br/> <br/> Independent variables and outcome variables should have a linear relationship among them, and to find out whether there is a linear or curvilinear relationship; scatterplots can be leveraged.<br/></p>
<ul>
<li><strong>Multivariate Normality:</strong> Residuals are normally distributed, as is assumed in multiple regressions.</li>
<li><strong>No Multicollinearity:</strong> Independent variables are not correlated among, as is assumed in multiple regressions. To test these assumptions, Variance Inflation Factor – VIF is used.</li>
<li><strong>Homoscedasticity:</strong> Error terms are similar across the values for independent variables in the assumptions made. predicted values Vs standardized residuals are used to showcase if points are successfully and equally distributed across independent variables.</li>
</ul>
<p><br/> <a href=”http://www.hitechbpo.com/market-research-and-data-analytics.php” target=”_blank”>Best data analytic solutions</a> are derived to automatically include assumption tests and plots while conducting regression. From nominal, ordinal, or interval/ratio level variables; multiple regression requires at least two independent variables. A rule of thumb for the sample size is that regression analysis requires at least 20 cases per independent variable in the analysis.<br/> <br/> <span class=”font-size-4″>Assumptions in your regression or ANOVA model</span><br/> <br/> We know, you know; how important are they because if they’re not met adequately, all the p-values will become inaccurate, wrong, &amp; useless. But linear models don’t have to fit precisely for p-values to be accurate, right and useful; they are robust enough to departures from these assumptions. Statistics classes and several other coaching places have been imparting such knowledge or contradictory statements as you may say; to drive analysts crazy.<br/> <br/> <em>It is a debatable topic as to whether statisticians cooked this stuff up to torture researchers, pun intended; or do they do it to satisfy their egos?</em><br/> <br/> Well, the reality is they don’t. Learning how to push robust assumptions is not that hard a task, of course when done with professional training and guidance, backed with some practice. <em>Enlisted are few of the mistakes researchers usually make because of one, or both of the above claims.</em><br/> <br/> <strong><span class=”font-size-3″>1. P-value is the feel-good factor</span></strong><br/> <br/> Avoiding over-testing of assumptions is one of the ways out. Statistical tests can help in determining if assumptions made are met adequately or not. Having a p-value is the feel-good factor, isn’t it? It helps you avoid further complications, and one can do it by leveraging the golden rule of p&lt;.05.<br/> <br/> In no case, the tests should ignore the robustness. Assuming that every distribution is non-normal and heteroskedastic; would be a mistake. Tools may prove helpful, but are developed to treat every data set as if it is a nail. The right thing to do is use the hammer when really required, and not hammer everything.<br/> <br/> <strong><span class=”font-size-3″>2. GLM is robust but not for all the assumptions</span></strong><br/> <br/> Here, assumptions are made that everything is robust and tests are avoided. It is a normal practice which succeeds most of the times. But there are instances, when it does not work. Yes, the robust GLM, for deviations from some of the assumptions, but are not robust all the way and not for all the assumptions. So check all of them without fail.<br/> <br/> <strong><span class=”font-size-3″>3. Test wrong assumptions</span></strong><br/> <br/> Testing wrong assumptions also is one of the mistakes that researchers do. They look at any two regression books and it will give them different sets of assumptions.<br/> <br/> <span class=”font-size-4″>Testing the related, but wrong thing</span><br/> <br/> Insights here might seem partial as several of these “assumptions” should be checked, but they are not model assumptions; instead are data challenges. At times the assumptions are taken to their logical conclusions, which adds up to it looking to be partial. The reference guide is trying to make it more logical for you, but in that attempt, it leads you to test the related but the wrong thing. It might work out most of the times, but not always.</p>



High Precision Computing in Python or R tag:www.analyticbridge.datasciencecentral.com,2017-11-14:2004291:BlogPost:373990
2017-11-14T02:00:00.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>Here we discuss an application of HPC (not high performance computing, instead high precision computing, which is a special case of HPC)  applied to dynamical systems such as  the logistic map in chaos theory. defined as X(k) = 4 X(k) (1 – X(k-1)). </p>
<p>For all these systems, the loss of precision propagates exponentially, to the point that after 50 iterations, all generated values are completely wrong. Tons of articles have been written on this subject – none of them acknowledging the…</p>


<p>Here we discuss an application of HPC (not high performance computing, instead high precision computing, which is a special case of HPC)  applied to dynamical systems such as  the logistic map in chaos theory. defined as X(k) = 4 X(k) (1 – X(k-1)). </p>
<p>For all these systems, the loss of precision propagates exponentially, to the point that after 50 iterations, all generated values are completely wrong. Tons of articles have been written on this subject – none of them acknowledging the faulty numbers being used, as round-off errors propagate as fast as chaos. This is an an active research area with applications in population dynamics, physics, and engineering. It does not invalidate the published results, as most of them are theoretical in nature, and do not impact the limiting distribution as the faulty sequences behave as instances of processes that are re-seeded every 40 iterations or so due to errors, behaving the same way regardless of the seed. </p>
<p>The core of the discussion here is about how to write code that produces far more accurate numbers, whether in R, Python or other languages, using super precision. In short, which libraries should you use to handle such problems?</p>
<p>You can check out the context, Perl code, Python code, and an Excel spreadsheet that illustrates the issue, in this discussion.  </p>
<p><a href=”https://www.datasciencecentral.com/forum/topics/question-how-precision-computing-in-python” target=”_blank”>Click here to read the full article</a>. </p>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220282290?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282290?profile=original” width=”350″ class=”align-center”/></a></p>
<p style=”text-align: center;”><em>This broccoli is an example of the self-replicating processes that could benefit from HPC</em></p>
<p style=”text-align: left;”><b>DSC Resources</b></p>
<ul>
<li>Services: <a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a> | <a href=”http://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a> | <a href=”http://classifieds.datasciencecentral.com/”>Classifieds</a> | <a href=”http://www.analytictalent.com/”>Find a Job</a></li>
<li>Contributors: <a href=”http://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a> | <a href=”http://www.datasciencecentral.com/forum/topic/new”>Ask a Question</a></li>
<li>Follow us: <a href=”http://www.twitter.com/datasciencectrl”>@DataScienceCtrl</a> | <a href=”http://www.twitter.com/analyticbridge”>@AnalyticBridge</a></li>
</ul>
<p style=”text-align: left;”>Popular Articles</p>
<ul>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/20-articles-about-core-data-science”>What is Data Science? 24 Fundamental Articles Answering This Question</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/hitchhiker-s-guide-to-data-science-machine-learning-r-python”>Hitchhiker’s Guide to Data Science, Machine Learning, R, Python</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
</ul>



Supervised learning in disguise: the truth about unsupervised learning tag:www.analyticbridge.datasciencecentral.com,2018-02-14:2004291:BlogPost:380742
2018-02-14T20:00:00.000Z


Danko Nikolic
https://www.analyticbridge.datasciencecentral.com/profile/DankoNikolic

<p>One of the first lessons you’ll receive in machine learning is that there are two broad categories: supervised and unsupervised learning. Supervised learning is usually explained as the one to which you provide the correct answers, training data, and the machine learns the patterns to apply to new data. Unsupervised learning is (apparently) where the machine figures out the correct answer on its own.</p>
<p>Supposedly, unsupervised learning can discover something new that has not been found…</p>


<p>One of the first lessons you’ll receive in machine learning is that there are two broad categories: supervised and unsupervised learning. Supervised learning is usually explained as the one to which you provide the correct answers, training data, and the machine learns the patterns to apply to new data. Unsupervised learning is (apparently) where the machine figures out the correct answer on its own.</p>
<p>Supposedly, unsupervised learning can discover something new that has not been found in the data before. Supervised learning cannot do that.</p>
<h2><span style=”color: #ff6600;”>The problem with definitions</span></h2>
<p>It’s true that there are two classes of machine learning algorithm, and each is applied to different types of problems, but is unsupervised learning really free of supervision?</p>
<p>In fact, this type of learning also involves a whole lot of supervision, but the supervision steps are hidden from the user. This is because the supervision is not explicitly presented in the data; you can only find it within the algorithm.</p>
<p>To understand this let us first consider the use of supervised learning. A prototypical method for supervised learning is regression. Here, the input and the output values – named X and Y respectively – are provided for the algorithm. The learning algorithm then assesses the model’s parameters such that it tries to predict the outputs (Y) for new inputs (X) as accurately as possible.</p>
<p>In other words, supervised learning finds a function: Y’ = f(X)</p>
<h2><span style=”color: #ff6600;”>Supervised learning success</span></h2>
<p>Supervised learning success is assessed by seeing how close Y’ is to Y, i.e. by computing error function.</p>
<p>This general principle of supervision in learning is the basic principle for logistic regression, support vector machines, decision trees, deep learning networks and many other techniques.</p>
<p>In contrast, unsupervised learning does not provide Y for the algorithm – only X is provided. Thus, for each given input we do not explicitly provide a correct output. The machine’s task is to “discover” Y on its own.</p>
<p>A common example is <a href=”https://en.wikipedia.org/wiki/Cluster_analysis”>cluster (or clustering) analysis</a>. Before a clustering analysis, there aren’t known clusters for the data points within the inputs, and yet the machine finds those clusters after the analysis. It’s almost as if the machine is creative – discovering something new in the data.</p>
<h2><span style=”color: #ff6600;”>Nothing new</span></h2>
<p>In fact, there is nothing new; the machine discovers only what it has been told to discover. Every unsupervised algorithm specifies what needs to be found in the data.</p>
<p>There must be criterion saying what success is. We don’t let algorithms do whatever they want, or ask machines to perform random analyses. There is always a goal to be accomplished, and that goal is carefully formulated as a constraint within the algorithms.</p>
<p>For example, in a clustering algorithm, you may require the distances between cluster centroids to be maximized, while the distances between data points belonging to the same cluster are minimized. Plus, for each data set there is an implicit Y, which for example may state to maximize the distance-between/distance-within ratio.</p>
<p>Therefore, the lack of supervision in these algorithms is nothing like the metaphorical “unsupervised child in a porcelain shop”, as this would not give us particularly useful machine learning. Instead, what we have is more akin to letting adults enter a porcelain shop without having to send a nanny too. The reason for our trust in adults is that they have already been supervised during childhood and have since (hopefully) internalized some of the rules.</p>
<p>Something similar happens with unsupervised machine learning algorithms; supervision has been internalized, as these methods come equipped with algorithms that informs what are good or bad model behaviours. Just as (most) adults have an internal voice telling them not to smash every item in the shop, unsupervised machine learning methods possess internal machinery that dictates what constitutes good behaviour.</p>
<h2><span style=”color: #ff6600;”>Supervised vs. unsupervised</span></h2>
<p>Fundamentally, the difference between supervised and unsupervised learning boils down to whether the computation of error utilizes an externally provided Y, or whether Y is internally computed from input data (X).</p>
<p>In both cases there is a form of supervision.</p>
<p>As all unsupervised learning is actually supervised, the main differentiator becomes the frequency at which intervention takes place. For example, do we intervene for each data point or just once, when the algorithm for computing Y out of X is designed?</p>
<p>Hence, within the so-called unsupervised methods, supervision is present, but hidden (it is disguised) because no special effort is required from the end user to supply supervision data. The algorithm seems to be magically supervised without an apparent supervisor. However, this does not mean that someone hasn’t gone through the pain of setting up the proper equations to implement an internal supervisor.</p>
<p>Consequently, unsupervised learning methods don’t truly discover anything new in any way that would overshadow the “discoveries” of supervised methods.</p>
<p></p>
<p>This blog entry is reposted from my original blog entry at <a href=”http://www.teradata.com”>www.teradata.com</a></p>
<p></p>



Machine Learning with Signal Processing Techniques tag:www.analyticbridge.datasciencecentral.com,2018-04-29:2004291:BlogPost:382590
2018-04-29T15:00:00.000Z


ahmet taspinar
https://www.analyticbridge.datasciencecentral.com/profile/ahmettaspinar

<p>Stochastic Signal Analysis is a field of science concerned with the processing, modification and analysis of (stochastic) signals.</p>
<p>Anyone with a background in Physics or Engineering knows to some degree about signal analysis techniques, what these technique are and how they can be used to analyze, model and classify signals.</p>
<p>Data Scientists coming from a different fields, like Computer Science or Statistics, might not be aware of the analytical power these techniques bring with…</p>


<p>Stochastic Signal Analysis is a field of science concerned with the processing, modification and analysis of (stochastic) signals.</p>
<p>Anyone with a background in Physics or Engineering knows to some degree about signal analysis techniques, what these technique are and how they can be used to analyze, model and classify signals.</p>
<p>Data Scientists coming from a different fields, like Computer Science or Statistics, might not be aware of the analytical power these techniques bring with them.</p>
<p>In this blog post, we will have a look at how we can use Stochastic Signal Analysis techniques, in combination with traditional Machine Learning Classifiers for accurate classification and modelling of time-series and signals.</p>
<p>At the end of the blog-post you should be able understand the various signal-processing techniques which can be used to retrieve features from signals and be able to <a href=”http://ieeexplore.ieee.org/abstract/document/1306572/” target=”_blank” rel=”noopener”>classify ECG signals</a><span> </span>(and even<span> </span><a href=”http://ieeexplore.ieee.org/abstract/document/4427376/” target=”_blank” rel=”noopener”>identify a person</a> by their ECG signal),<span> </span><a href=”https://www.kaggle.com/c/seizure-prediction” target=”_blank” rel=”noopener”>predict seizures</a><span> </span>from EEG signals,<span> </span><a href=”https://www.spiedigitallibrary.org/conference-proceedings-of-spie/5808/0000/Comparative-analysis-of-feature-extraction-2D-FFT-and-wavelet-and/10.1117/12.597305.short” target=”_blank” rel=”noopener”>classify and identify</a><span> </span>targets in radar signals,<span> </span><a href=”https://link.springer.com/article/10.1007/s10916-005-5184-7″ target=”_blank” rel=”noopener”>identify patients with neuropathy or myopathyetc</a><span> </span>from EMG signals by using the FFT, etc etc.</p>
<p> <a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220288085?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220288085?profile=original” width=”522″ class=”align-center”/></a></p>
<p>To read more, <a href=”http://ataspinar.com/2018/04/04/machine-learning-with-signal-processing-techniques/” target=”_blank” rel=”noopener”>click here</a>.</p>



20 Questions to Ask Prior to Starting Data Analysis tag:www.analyticbridge.datasciencecentral.com,2018-05-24:2004291:BlogPost:383892
2018-05-24T02:30:00.000Z


Cynthia Clare
https://www.analyticbridge.datasciencecentral.com/profile/CynthiaClare

<div class=”aspectRatioPlaceholder is-locked”><div class=”progressiveMedia js-progressiveMedia graf-image is-imageLoaded is-canvasLoaded”><img class=”progressiveMedia-image js-progressiveMedia-image align-center” src=”https://cdn-images-1.medium.com/max/1600/1*Vnb4dXdN2L1vTExrp4cF4A.jpeg”></img></div>
</div>
<p></p>
<p class=”graf graf–p graf-after–figure” id=”139d”>It is crucial to ask the right questions and/or understand the problem, prior to beginning data analysis. Below is a list of 20 questions you need to ask before delving into analysis:</p>
<ol class=”postList”>
<li class=”graf graf–li graf-after–p” id=”ca7c”>Who is the…</li>
</ol>


<div class=”aspectRatioPlaceholder is-locked”><div class=”progressiveMedia js-progressiveMedia graf-image is-imageLoaded is-canvasLoaded”><img class=”progressiveMedia-image js-progressiveMedia-image align-center” src=”https://cdn-images-1.medium.com/max/1600/1*Vnb4dXdN2L1vTExrp4cF4A.jpeg”/></div>
</div>
<p></p>
<p id=”139d” class=”graf graf–p graf-after–figure”>It is crucial to ask the right questions and/or understand the problem, prior to beginning data analysis. Below is a list of 20 questions you need to ask before delving into analysis:</p>
<ol class=”postList”>
<li id=”ca7c” class=”graf graf–li graf-after–p”>Who is the audience that will use the results from the analysis? (board members, sales people, customers, employees, etc)</li>
<li id=”93d9″ class=”graf graf–li graf-after–li”>How will the results be used? (make business decision, invest in product category, work with a vendor, identify risks, etc)</li>
<li id=”e405″ class=”graf graf–li graf-after–li”>What questions will the audience have about our analysis? (ability to filter on key segments, look at data across time to identify trends, drill-down into details, etc)</li>
<li id=”992e” class=”graf graf–li graf-after–li”>How should the questions be prioritized to derive the most value?</li>
<li id=”9d33″ class=”graf graf–li graf-after–li”>Identify key stakeholders and get their input on interesting questions</li>
<li id=”94ee” class=”graf graf–li graf-after–li”>Who should be able to access the information? think about confidentiality/ security concerns</li>
<li id=”f30f” class=”graf graf–li graf-after–li”>Who will develop and maintain the report?</li>
<li id=”95cf” class=”graf graf–li graf-after–li”>What information will be on each report?</li>
<li id=”348c” class=”graf graf–li graf-after–li”>What reports currently exist in another format? What changes might be made to existing reports?</li>
<li id=”9b08″ class=”graf graf–li graf-after–li”>What ETLs or stored procedures need to be developed, if any?</li>
<li id=”7218″ class=”graf graf–li graf-after–li”>What database enhancements are required to meet reporting requirements?</li>
<li id=”8c86″ class=”graf graf–li graf-after–li”>When will each report be delivered?</li>
<li id=”3190″ class=”graf graf–li graf-after–li”>What is the frequency of updates required for the data? to ensure currency</li>
<li id=”08d1″ class=”graf graf–li graf-after–li”>Which data sources are available to work with?</li>
<li id=”4ab8″ class=”graf graf–li graf-after–li”>Do I have the required permissions or credentials to access the data necessary for analysis?</li>
<li id=”28bf” class=”graf graf–li graf-after–li”>What is size of each data set and how much data will I need to get from each one?</li>
<li id=”1798″ class=”graf graf–li graf-after–li”>How familiar am I with the underlying tables and schema in each database? Do I need to work with anyone else to understand the data structure</li>
<li id=”ffeb” class=”graf graf–li graf-after–li”>Do I need all the data for more granular analysis, or do I need a subset to ensure faster performance?</li>
<li id=”e166″ class=”graf graf–li graf-after–li”>Will the data need to be standardized due to disparity?</li>
<li id=”e46a” class=”graf graf–li graf-after–li”>Will I need to analyze data from external sources, which resides outside of my organization’s data?</li>
</ol>
<p id=”ae84″ class=”graf graf–p graf-after–li”>Sources:</p>
<p id=”e981″ class=”graf graf–p graf-after–p”><a href=”https://www.sisense.com/blog/requirements-elicitation-enterprise-business-analytics/” class=”markup–anchor markup–p-anchor” rel=”noopener” target=”_blank”>https://www.sisense.com/blog/requirements-elicitation-enterprise-business-analytics/</a></p>
<p id=”4414″ class=”graf graf–p graf-after–p graf–trailing”><a href=”http://www.dallasisd.org/cms/lib/TX01001475/Centricity/domain/5173/updated%20files%20as%20of%200912/Data%20Analysis%20Guiding%20Questions.pdf” class=”markup–anchor markup–p-anchor” rel=”noopener” target=”_blank”>http://www.dallasisd.org/cms/lib/TX01001475/Centricity/domain/5173/updated%20files%20as%20of%200912/Data%20Analysis%20Guiding%20Questions.pdf</a></p>



Curious Mathematical Problem tag:www.analyticbridge.datasciencecentral.com,2018-08-31:2004291:BlogPost:387764
2018-08-31T05:00:00.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>Let us consider the following equation:</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220286074?profile=original” target=”_self”><img class=”align-center” src=”http://storage.ning.com/topology/rest/1.0/file/get/2220286074?profile=original” width=”359″></img></a></p>
<p>Prove that</p>
<ul>
<li><em>x</em> = log(Pi) = 1.14472988584… is a very good approximation of a solution, up to 10 digits.</li>
<li>Using <a href=”https://www.datasciencecentral.com/page/search?q=high+performance+computing” rel=”noopener” target=”_blank”>high performance computing</a> or other means, prove that it is…</li>
</ul>


<p>Let us consider the following equation:</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220286074?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220286074?profile=original” width=”359″ class=”align-center”/></a></p>
<p>Prove that</p>
<ul>
<li><em>x</em> = log(Pi) = 1.14472988584… is a very good approximation of a solution, up to 10 digits.</li>
<li>Using <a href=”https://www.datasciencecentral.com/page/search?q=high+performance+computing” target=”_blank” rel=”noopener”>high performance computing</a> or other means, prove that it is correct up to 1,000 digits.</li>
<li>Is <em>x</em> = log(Pi) an exact solution?</li>
</ul>
<p>If the answer to the last question is positive, this would mean that log(Pi) is NOT a transcendental number, but rather, an algebraic number. A remarkable result in itself!</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220286475?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220286475?profile=original” width=”473″ class=”align-center”/></a></p>
<p style=”text-align: center;”><em>Source for picture: <a href=”https://en.wikipedia.org/wiki/Algebraic_number” target=”_blank” rel=”noopener”>algebraic numbers</a></em></p>
<p><strong>Solution and related problem</strong></p>
<p>Any real number larger or equal to 1 is a solution, so there is nothing particular with log(Pi). A more subtle version of this problem is to ask the student to solve the following equation:</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220292303?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220292303?profile=original” width=”395″ class=”align-center”/></a></p>
<p>We know from the previous problem that if <em>x</em>^5 – <em>x</em>^2 – 1 = <em>x</em>^2 – 1, the equality holds. Thus to find a solution, we just need to solve <em>x</em>^5 – <em>x</em>^2 – 1 = <em>x</em>^2 – 1. The cubic root of 2 is a solution.</p>
<p>More generally, let’s define</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220292348?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220292348?profile=original” width=”617″ class=”align-center”/></a></p>
<p>Then the (unique) real-valued solution to the equation <em>f</em>(<em>x</em>) = 0 is given by</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220299396?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220299396?profile=original” width=”95″ class=”align-center”/></a></p>
<p>In particular, if <em>p</em> = 3, then <em>x</em> = 2. If <em>p</em> = 2 + log(2) / log(3), then <em>x</em> = 3. Note that the function <em>f</em> is monotonic, and thus invertible. What is the inverse of <em>f</em>?</p>
<p><strong>Another curious problem </strong></p>
<p>Find an exact,non-trivial solution to the following equation:</p>
<p style=”text-align: center;”>sin <em>x</em> + sin(2<em>x</em> sin <em>x</em>) = sin 3<em>x.</em></p>
<p style=”text-align: left;”>This is part of a general type of equations, namely <em>f</em>(<em>x</em>) + <em>f</em>(<em>x</em> <em>f</em>(<em>x</em>)) = 2 <em>f</em>(1), <span>where one of the solutions must of course also be solution of <em>f</em>(<em>x</em>) = 1. Here, <em>f</em>(<em>x</em>) = 2 sin <em>x</em>, and thus <em>x</em> = Pi / 6 is a solution. See plot of sin <em>x</em> + sin(2<em>x</em> sin <em>x</em>) – sin 3<em>x</em>, below.</span><span><a href=”https://storage.ning.com/topology/rest/1.0/file/get/1800999822?profile=original” target=”_blank” rel=”noopener”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/1800999822?profile=RESIZE_710x” class=”align-center”/></a></span></p>
<p><em>For related articles from the same author, <a href=”http://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles” target=”_blank” rel=”noopener”>click here</a><span> </span>or visit<span> </span><a href=”http://www.vincentgranville.com/” target=”_blank” rel=”noopener”>www.VincentGranville.com</a>. Follow me on<span> </span><a href=”https://www.linkedin.com/in/vincentg/” target=”_blank” rel=”noopener”>on LinkedIn</a>, or visit my old web page<span> </span><a href=”http://www.datashaping.com”>here</a>.</em></p>
<p><span style=”font-size: 14pt;”><b>DSC Resources</b></span></p>
<ul>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/invitation-to-join-data-science-central”>Invitation to Join Data Science Central</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/fee-book-applied-stochastic-processes”>Free Book: Applied Stochastic Processes</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources”>Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles”>Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a><span> </span>|<span> </span><a href=”http://classifieds.datasciencecentral.com”>Classifieds</a><span> </span>|<span> </span><a href=”http://www.analytictalent.com”>Find a Job</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/forum/topic/new”>Forum Questions</a></li>
</ul>



Top 10 PHP Frameworks for Website Design and Development tag:www.analyticbridge.datasciencecentral.com,2018-07-30:2004291:BlogPost:387174
2018-07-30T10:30:00.000Z


Rajveer Singh Rathore
https://www.analyticbridge.datasciencecentral.com/profile/RajveerSinghRathore643

<p>PHP, known as the most popular server-side scripting language in the world, has evolved a lot since the first inline code snippets appeared in static HTML files.</p>
<p>These days developers need to build complex websites and web apps, and above a certain complexity level it can take too much time and hassle to always start from scratch, hence came the need for a more structured natural way of development. PHP frameworks provide developers with an adequate solution for that.</p>
<p>Choosing…</p>


<p>PHP, known as the most popular server-side scripting language in the world, has evolved a lot since the first inline code snippets appeared in static HTML files.</p>
<p>These days developers need to build complex websites and web apps, and above a certain complexity level it can take too much time and hassle to always start from scratch, hence came the need for a more structured natural way of development. PHP frameworks provide developers with an adequate solution for that.</p>
<p>Choosing a right PHP development framework to develop a web application for the business can be a very difficult task because there are so many options available. From the past few years, we at ValueCoders are using Laravel framework on a regular basis. We have worked and tested other PHP development frameworks as well. However, we needed some additional features and capabilities. In our previous blog, where we discussed top PHP development frameworks. In this post, we will discuss what made us feel that Laravel is the best PHP framework in 2018. Currently, Laravel has 38132 stars on Github. Here is the given below picture to show how many websites are created using Laravel currently.</p>
<p></p>
<h3 id=”laravel”><span id=”1_Laravel”>1. Laravel</span></h3>
<p>Coming in at number 1 on our list, is Laravel.<span> </span><a href=”https://laravel.com/” target=”_blank” rel=”noopener”>Laravel</a><span> </span>is a comprehensive framework designed for rapidly building applications using the MVC architecture. Laravel is currently the most popular PHP framework, with a huge community of developers.</p>
<p>It features tons of laravel specific packages, the lightweight Blade templating engine, unit testing, ORM, a packaging system, RESTful controllers, and now Laravel is the first framework to introduce routing in a abstract way.It takes out the hassle of code organization.</p>
<p>Queue management is also a feature that handles tasks in the background and then logs the activity for you all while tasks are normally run in the frontend. Packages can be easily added with the robust Composer built in to Laravel. It integrates with Gulp and Elixir so any npm packages and bower packages can be called directly via ssh.</p>
<p>One of the best things Laravel handles well are noSQL structures like<span> </span><a href=”http://coderseye.com/learn-mongodb-tutorials-as-a-beginner/” target=”_blank” rel=”noopener”>MongoDB</a><span> </span>or Redis. It’s easy to get started with Laravel thanks to its extensive documentation, popularity, and Laravel Udemy: popular videos and tutorials meant to get developers new to Laravel up and running.</p>
<p>We have looked for the most comprehensive<span> </span><strong>free</strong><span> </span>PHP course for our readers and found the folks at<span> </span><a href=”https://coderseye.com/team-treehouse-review”>Team Treehouse</a><span> </span>do by far the best job. You can sign up for free and start learning how to be a PHP pro.</p>
<p><a class=”orange-button” href=”https://coderseye.com/udemy-php-19″ target=”_blank” rel=”nofollow noopener”>90% off Training</a></p>
<p>Download and Info: <a href=”https://laravel.com/” rel=”nofollow”>https://laravel.com/</a></p>
<h3 id=”phalcon”><span id=”2_Phalcon”>2. Phalcon</span></h3>
<p>Phalcon is a MVC based PHP framework, uniquely built as C-extension, meaning it’s absolutely blazing fast. Phalcon uses very few resources in comparison to other frameworks, translating into very fast processing of HTTP requests, which can be critical for developers working with systems that don’t offer much overhead.</p>
<p>Phalcon has been actively developed since 2012, and includes ORM, MVC, caching, and auto-loading components. Its latest and first long term support release includes support for PHP 7.</p>
<p>Phalcon brings developers data storage tools such as its own SQL dialect: PHQL, as well as Object Document Mapping for MongoDB. Other features include Template engines, form builders, ease of building applications with international language support, and more. Phalcon is ideal for building both performance REST APIs, as well as full-fledged web applications.</p>
<p>Download and Info: <a href=”https://phalconphp.com/en/” rel=”nofollow”>https://phalconphp.com/en/</a></p>
<h3 id=”codeignitor”><span id=”3_Codeigniter”>3. Codeigniter</span></h3>
<p>Codeigniter is an ideal framework for rapid application development. It’s a lightweight, low-hassle, framework with a small footprint that can be installed just by uploading it directly to your hosting. No special command line or software installation is required. Upload the files and you’re ready to go.</p>
<p>Building full fledged web applications is a breeze with it’s small learning curve, and numerous libraries. Speaking of east of development, Codeigniter’s documentations is extensive, and it’s community is vast and very helpful. Code Igniter is backed by a academic entity as well: The British Columbia Institute of Technology, which will help ensure it’s continued development and growth.</p>
<p>Feature-wise, Codeigniter comes with many built in libraries for unit testing, form validation, email, sessions, and much more! If you can’t find a library you’re looking for, it’s also pretty easy to build your own, and then share it with the community.</p>
<p>Download and Info: <a href=”https://www.codeigniter.com/” rel=”nofollow”>https://www.codeigniter.com/</a></p>
<h3 id=”symphony”><span id=”4_Symfony”>4. Symfony</span></h3>
<p>Symfony has been touted for a while now as a very stable, high performance, well documented, and modular project. Symfony is backed by the French SensioLabs, and has been developed by them and it’s community to be a fantastic framework.</p>
<p>Symfony is used by many big name companies like the BBC and open source project such as Drupal and eZpublish. Symfony was written with stability in mind in a very professional way. It’s documentation is extensive, and it’s community is just as vast. building both performance REST APIs, as well as full fledged web applications.</p>
<p>Download and Info: <a href=”https://symfony.com/” rel=”nofollow”>https://symfony.com/</a></p>
<h3 id=”cakephp”><span id=”5_Cakephp”>5. Cakephp</span></h3>
<p>Cakephp is an ideal framework for beginners and for rapidly developing commercial web apps. It comes with code generation and scaffolding functionality to speed up the development process, while also bringing in tons of packages to take care of common functionality.</p>
<p>It’s unique in having MVC conventions that help guide the development process. Configuration is also a breeze as it removes the need for complicated XML or YAML config files. Builds are fast and its security features include measures to prevent XSS, SQL Injection, CSRF, and tools for form validation.</p>
<p>CakePHP is under active development with good documentation and lots of support portals to help get started. Premium Support is also an option for developers who choose to use Cakephp, via the Cake Development Corporation.</p>
<p>Download and Info: <a href=”http://cakephp.org/” rel=”nofollow”>http://cakephp.org/</a></p>
<h3 id=”zend”><span id=”6_Zend_Framework”>6. Zend Framework</span></h3>
<p>Zend Framework is a popular, go-to professional framework commonly used for high-performance enterprise-level applications. Zend is built with security, performance, and extensibility in mind.</p>
<p>Because of it’s focus for enterprise applications, it has tons of components for authentication, feeds, forms, services and more. Because of its enterprise driven nature, Zend isn’t ideal for rapid application development, though it does come with tools to make a developer’s life easier, including Zend’s proprietary IDE: Zend Studio. Which neatly integrates with Zend Framework.</p>
<p>Download and Info: <a href=”https://framework.zend.com/” rel=”nofollow”>https://framework.zend.com/</a></p>
<h3 id=”fuelphp”><span id=”7_Fuel_PHP”>7. Fuel PHP</span></h3>
<p>FuelPHP is a sophisticated, modern, highly modular, extensible, MVC PHP framework that is built with HMVC architecture in mind. It features lightweight and powerful ORM support, template parsing, security enhancements, it’s own authentication framework, and many packages to further extend a 21  developer’s capabilities.</p>
<p>Because of it’s community driven nature, FuelPHP is actively developed, and planned changes (in V2) include making it fully object-oriented, with the ability to install the framework using composer,as well as support for multiple applications on a single installation.</p>
<p>Download and Info: <a href=”http://fuelphp.com/” target=”_blank” rel=”noopener”>http://fuelphp.com/</a></p>
<h3 id=”slim”><span id=”8_Slim”>8. Slim</span></h3>
<p>Slim is a very minimal micro-framework inspired by Ruby’s Sinatra. It’s best utilized for building lightweight RESTful APIs, with it’s built standard and add-on features such as URL handling &amp; routing, and HTTP caching. Developing with Slim is very easy as well due to being very actively maintained, as well having extensive, beginner friendly documentation.</p>
<p>Download and Info: <a href=”http://www.slimframework.com/” rel=”nofollow”>http://www.slimframework.com/</a></p>
<h3 id=”phpixie”><span id=”9_Phpixie”>9. Phpixie</span></h3>
<p>Phpixie is a relatively new framework, designed to be lightweight, modularized, and easy to get started with. Phpixie compiles fast, and is modularized.</p>
<p>It comes bundled with great tools for cryptography and security, support for MongoDB, and code sharing with composer, all right out of the box. Some downsides to Phpixie however are its relatively few modules, and lower popularity in comparison to other frameworks.</p>
<p>Download and Info: <a href=”https://phpixie.com/” rel=”nofollow”>https://phpixie.com/</a></p>
<h3 id=”fatfree”><span id=”10_Fat_Free”>10. Fat Free</span></h3>
<p>Fat-Free is a very modular PHP micro framework with tons of packages that put it between a true micro-framework and a full-fledged PHP framework such as Laravel. Fat-Free comes jam-packed with packages for unit testing, image processing, CSS compression, data validation, Open ID, and much more.</p>
<p>Fat free has off the shelf support for both SQL and NoSQL databases, and makes the development of multilingual web-apps very easy. That being said, Fat-Free is nothing really new, and it’s kind of overkill for a micro-framework.</p>
<p>Download and Info: <a href=”https://fatfreeframework.com/home” rel=”nofollow”>https://fatfreeframework.com/home</a></p>
<p></p>
<p><em>For Website Design &amp; Development, Mobile application development then you can contact us <a href=”https://www.yugtechnology.com” target=”_blank” rel=”noopener”>here</a>. </em></p>



Mathematical Olympiads for Undergrad Students tag:www.analyticbridge.datasciencecentral.com,2018-05-25:2004291:BlogPost:384136
2018-05-25T15:00:00.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>Mathematical Olympiads are popular among high school students. However, there is nothing similar for college students, except maybe <a href=”http://www.imc-math.org.uk/” rel=”noopener” target=”_blank”>IMC</a>. Even IMC is not popular. It focuses mostly on the same kind of problems as high school Olympiads, and you can not participate if you are over 23 years old. In addition, it is organized by country, as opposed to globally, thus favoring countries with a large population. Topics such as…</p>


<p>Mathematical Olympiads are popular among high school students. However, there is nothing similar for college students, except maybe <a href=”http://www.imc-math.org.uk/” target=”_blank” rel=”noopener”>IMC</a>. Even IMC is not popular. It focuses mostly on the same kind of problems as high school Olympiads, and you can not participate if you are over 23 years old. In addition, it is organized by country, as opposed to globally, thus favoring countries with a large population. Topics such as probability are never considered.</p>
<p>This is an opportunity to create Mathematical Olympiads for college students, with no age or country restrictions. It could be organized online, offering interesting, varied, and challenging problems, allowing participants to read literature about the problems, and have a few weeks to submit a solution. In short, something like Kaggle competitions, except that Kaggle focuses exclusively on machine learning, coding, and data processing. Not sure where the funding could come from, but if I decided to organize this kind of competition, I would be able to fund it myself. </p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220291730?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220291730?profile=original” width=”299″ class=”align-center”/></a></p>
<p>Below are examples of problems that I would propose. They do not require knowledge beyond advanced undergrad level in math, statistics, or probabilities. They are are more difficult, and more original, than typical exam questions. Participants are encouraged to use tools such as <a href=”https://www.datasciencecentral.com/profiles/blogs/great-mathematical-api-by-wolfram” target=”_blank” rel=”noopener”>WolframAlpha</a> to automatically compute integrals or solve systems of equations involved in these problems.</p>
<p>Is anyone interested in this new initiative? I could see this helping students not enrolled in a top university, though the majority of winners would probably come from a top school.</p>
<p><strong>Problem 1</strong></p>
<p>Using complex analysis or other means, prove that <a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220296037?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220296037?profile=original” width=”554″ class=”align-center”/></a></p>
<p>See solution <a href=”https://www.datasciencecentral.com/profiles/blogs/two-beautiful-mathematical-results-part-2″ target=”_blank” rel=”noopener”>here</a>. </p>
<p><strong>Problem 2</strong></p>
<p><span>Points are randomly distributed on the plane, with an average of </span><em>m</em><span> points per unit area. A circle of radius </span><em>R</em><span> is drawn around each point. What is the proportion of the plane covered by these (possibly overlapping) circles? How can you use this problem to compute an approximation of exp(-Pi)?</span></p>
<p><span>See solution <a href=”https://www.datasciencecentral.com/profiles/blogs/little-stochastic-geometry-problem-random-circles” target=”_blank” rel=”noopener”>here</a>. </span></p>
<p><strong>Problem 3</strong></p>
<p><span>What is the minimal correlation between two random variables if the marginal distributions are exponential? Prove that it can not be lower than 1 – (Pi^2 / 6) = -0.644. Provide an example where the lower bound is attained.</span></p>
<p><span>See solution <a href=”https://www.datasciencecentral.com/profiles/blogs/9-off-th-beaten-path-statistical-science-topics” target=”_blank” rel=”noopener”>here</a> (in section 9.)</span></p>
<p><strong>Problem 4</strong></p>
<p>A special version of the logistic map is produced by an iterative algorithm, as follows: the seed <em>x</em> = <em>x</em>(1) is anywhere in [0, 1] and <em>x</em>(<em>n</em>+1) = <em>g</em>(<em>x</em>(<em>n</em>)) with <em>g</em>(<i>y</i>) = SQRT(4*<i>y</i>*(1-<i>y</i>)). The equilibrium distribution satisfies the stochastic integral equation P(<em>X</em>  &lt;  <em>y</em>) = P(<em>g</em>(<em>X</em>)  &lt;  <em>y</em>). You can look at this as if <em>X</em> is a random variable with the observed values being the successive values of <em>x</em>(<em>n</em>). The equilibrium distribution does not depend on the seed <em>x</em> except for very rare, bad seeds, that do not behave well. Solve the stochastic integral equation to derive the equilibrium distribution. Also, what is the theoretical correlation between <em>x</em>(<em>n</em>) and <em>x</em>(<em>n</em>+1), at equilibrium? (answer: -1/2.)</p>
<p>See solution <a href=”https://www.datasciencecentral.com/profiles/blogs/amazing-random-sequences-with-cool-applications” target=”_blank” rel=”noopener”>here</a> (in last section.)</p>
<p><strong>Problem 5</strong></p>
<p>Prove the following result, which has been used to provide one of the most elementary proofs of the prime number theorem. This is a classic algebraic result that applies to many sequences of slowly increasing positive integers, not just to prime numbers. If<span> </span><em>Q</em><span> </span>is an infinite set of positive integers, with<span> </span><em>Q</em>(<em>n</em>) being the subset of all integers in<span> </span><em>Q</em><span> </span>that are less than or equal to<span> </span><em>n</em>, then under rather general conditions (identify these conditions), we have</p>
<p><a href=”https://storage.ning.com/topology/rest/1.0/file/get/2220296134?profile=original” target=”_self”><img src=”https://storage.ning.com/topology/rest/1.0/file/get/2220296134?profile=original” width=”161″ class=”align-center”/></a></p>
<p>where<span> </span><em>d</em>(<em>p</em>) is the difference between<span> </span><em>p</em><span> </span>and the largest element of<span> </span><em>Q</em><span> </span>that is smaller than<span> </span><em>p</em>.</p>
<p>See solution <a href=”https://www.datasciencecentral.com/profiles/blogs/simple-proof-of-prime-number-theorem” target=”_blank” rel=”noopener”>here</a>. </p>
<p><strong>Problem 6</strong></p>
<p>Prove the following result:</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220296938?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220296938?profile=original” width=”555″ class=”align-center”/></a></p>
<p>You can find the solution <a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220297063?profile=original” target=”_self”>in this paper</a> (PDF document.)</p>
<p><strong>Problem 7</strong></p>
<p>Prove the following result:</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220297207?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220297207?profile=original” width=”401″ class=”align-center”/></a></p>
<p>See solution <a href=”https://www.datasciencecentral.com/profiles/blogs/two-beautiful-mathematical-results” target=”_blank” rel=”noopener”>here</a>. </p>
<p><strong>Problem 8</strong></p>
<p>Consider the following iterative algorithm:</p>
<p><span class=”font-size-2″><em>p</em>(0) = 0, <em>p</em>(1)= 1, <em>e</em>(1) = 2</span></p>
<p><span class=”font-size-2″><strong>If</strong><span> </span>4<em>p</em>(<em>n</em>) + 1 &lt; 2<em>e</em>(<em>n</em>)<span> </span><strong>Then</strong></span></p>
<ul>
<li><span><em>p</em>(<em>n</em>+1) = 2<em>p</em>(<em>n</em>) + 1</span></li>
<li><span><em>e</em>(<em>n</em>+1) = 4<em>e</em>(<em>n</em>) – 8<em>p</em>(<em>n</em>) – 2</span></li>
<li><span><em>d</em>(<em>n</em>+1) = 1</span></li>
</ul>
<p><span class=”font-size-2″><strong>Else</strong></span></p>
<ul>
<li><span><em>p</em>(<em>n</em>+1) = 2<em>p</em>(<em>n</em>)</span></li>
<li><span><em>e</em>(<em>n</em>+1) = 4<em>e</em>(<em>n</em>)</span></li>
<li><span><em>d</em>(<em>n</em>+1) = 0</span></li>
</ul>
<p>Note that <em>d</em>(<em>n</em>+1) = <em>p</em>(<em>n</em>+1) – 2<em>p</em>(<em>n</em>).</p>
<p>Prove that <em>d</em>(<em>n</em>) is the <em>n</em>-th decimal of SQRT(2)/2 in base 2.</p>
<p>See solution <a href=”https://www.analyticbridge.datasciencecentral.com/forum/topics/challenge-of-the-week-square-root-of-two” target=”_blank” rel=”noopener”>here</a>.</p>
<p><strong>Problem 9 </strong></p>
<p><span>The recursive relation </span><em>g</em><span>(</span><em>n</em><span>) = </span><em>n</em><span> − </span><em>g</em><span>(</span><em>g</em><span>(</span><em>n</em><span> − 1)), </span><em>g</em><span>(0) = 0, appears in the context of Fibonacci numbers, as you can see in </span><a name=”bBIB1″ href=”https://www.sciencedirect.com/science/article/pii/0022314X88900200#BIB1″ class=”workspace-trigger” id=”bBIB1″>Hofstadter [“Bach, Esher, Godel,” pp. 151–154, Intereditions, 1985]</a><span>. Prove that</span></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220297082?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220297082?profile=original” width=”220″ class=”align-center”/></a></span></p>
<p><span>See solution <a href=”https://www.sciencedirect.com/science/article/pii/0022314X88900200″ target=”_blank” rel=”noopener”>here</a>,  published in <em>Journal of Number Theory</em>.</span></p>
<p><strong>Problem 10</strong></p>
<p><span>On the digits of Pi/4 in base <em>b</em>. Prove that if <em>b</em> is an even integer and <em>n</em> &gt; 3, then the <em>n</em>-th digit of <em>x</em> = Pi/4 is equal to INT(<em>b</em> * <em>x</em>(<em>n</em>)), with </span></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220297280?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220297280?profile=original” width=”211″ class=”align-center”/></a></span><span>The first digit (<em>n</em> = 1) starts after the decimal point. Show that this formula is not valid in general, for other values of x, or if <em>b</em> is an odd integer. You can find more about this <a href=”https://www.datasciencecentral.com/profiles/blogs/number-representation-systems-explained-in-one-picture” target=”_blank” rel=”noopener”>in the following article</a>. </span></p>
<p><span style=”font-size: 14pt;”><b>DSC Resources</b></span></p>
<ul>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter”>Subscribe to our Newsletter</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources”>Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles”>Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a><span> </span>|<span> </span><a href=”http://classifieds.datasciencecentral.com”>Classifieds</a><span> </span>|<span> </span><a href=”http://www.analytictalent.com”>Find a Job</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/forum/topic/new”>Forum Questions</a></li>
</ul>
<p></p>



The Role of Predictive Analytics in Medical Diagnosis tag:www.analyticbridge.datasciencecentral.com,2018-05-22:2004291:BlogPost:383823
2018-05-22T08:30:00.000Z


Goli Tajadod
https://www.analyticbridge.datasciencecentral.com/profile/GoliTajadod

<p>Pre<span>dictive analytics uses current and historical data in order to determine the probability of a particular outcome. This is a particularly powerful approach when it is applied to medical diagnosis. In an effort to reduce misdiagnosis, historical data of former patient’s symptoms may be applied to the assessment of a new patient.</span></p>
<p></p>
<p><span>While doctors are the ultimate experts and decision-makers, using predictive analytics as a means of establishing precedent for…</span></p>


<p>Pre<span>dictive analytics uses current and historical data in order to determine the probability of a particular outcome. This is a particularly powerful approach when it is applied to medical diagnosis. In an effort to reduce misdiagnosis, historical data of former patient’s symptoms may be applied to the assessment of a new patient.</span></p>
<p></p>
<p><span>While doctors are the ultimate experts and decision-makers, using predictive analytics as a means of establishing precedent for particular patient ailments can be of significant benefit in lieu of a second or even a third or fourth opinion. Predictive analytics can help doctors to make even more informed and insightful diagnoses by using current and/or historical data. After all, the diagnosis phase is certainly the most important when it is put into the context of the patient’s overall journey from definition of initial symptoms to ultimate recovery. This is particularly important when one considers that misdiagnosis accounts for as much as one third of all medical errors, and diagnosis error can be three times more common than prescription error.*</span></p>
<p></p>
<p><span>The latest version of FlexRule includes a new sample project called Predictive Analytics. This describes how a patient’s history at a particular medical centre during a specific period of time could help doctors validate the diagnoses for new patients with a similar set of symptoms during the same time frame.</span></p>
<p></p>
<p><span>At the highest level, the FlexRule project involves three main actions. These are as follows:</span></p>
<p></p>
<ol>
<li><span>Read the patient’s historical data</span></li>
<li><span>Use the Naïve Bays (NB) Algorithm to train the model</span></li>
<li><span>Predict the diagnosis</span></li>
</ol>
<p><strong><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220287697?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220287697?profile=original” width=”458″ class=”align-left”/></a></span></strong></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<ol>
<li><strong><span>Reading the Patient’s Historical Data<br/> <br/></span></strong> During this first stage of the FlexRule project, the patient’s data is read from a Data Sheet (in this case a CSV file). This data sheet includes 20 patients, all of whom have different symptoms associated with their doctor’s final diagnoses and associated lab tests. For example, the patient on the first row experienced a sore throat and sneezing, whereas the second patient did not have a sore throat, but did have a stuffy nose, as well as symptoms of fatigue. Both patients were subsequently diagnosed with colds.</li>
</ol>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220288712?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220288712?profile=original” width=”508″ class=”align-center”/></a></p>
<p><span> </span></p>
<ol start=”2″>
<li><strong><span>Use the Naïve Bays (NB) Algorithm to train the model<br/> <br/></span></strong> The next step is to train the new predictive model or reload an existing model by using the Naïve Bayes (NB) algorithm. This is a scalable classification algorithm for the type of dataset used in the FlexRule project. The NB algorithm is often applied successfully to medical problems with large repositories of data and features. At this point, the FlexRule model is ready to predict a diagnosis for patients who display similar symptoms to those already assessed. </li>
</ol>
<p><span><br/></span></p>
<ol start=”3″>
<li><strong><span>Predict the Diagnosis<br/> <br/></span></strong> In the final step, the data relating to a new patient’s symptoms is passed to the predictive model. For example:<br/> <br/> <br/> <br/> Using all of the available historical data allied with the new patient’s symptoms, the FlexRule predictive model now calculates the percentage probability of particular diagnoses as shown below:</li>
</ol>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220288800?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220288800?profile=original” width=”403″ class=”align-center”/></a><br/> <br/> In this example, it is very clear that our new patient will most likely be diagnosed as having a cold, as the probability is shown as 74.81%. The probability of our patient having the flu or allergies is 8.76% and 16.41% respectively.</p>
<p>While this is a very simple example, it shows how FlexRule and predictive modelling can help a doctor make better quality patient diagnosis.</p>
<p><span>No doubt most patients would also agree with this type of approach to medical diagnoses!</span></p>
<p><span>Read more <a href=”http://www.flexrule.com” target=”_blank” rel=”noopener”>here</a>. </span></p>
<p></p>
<p><em>* Ian Ayres; Super Crunchers, How anything can be predicted, page 97</em></p>



I Analyzed 10 MM digits of SQRT(2) – Look at My Findings tag:www.analyticbridge.datasciencecentral.com,2018-04-01:2004291:BlogPost:381776
2018-04-01T04:30:00.000Z


Vincent Granville
https://www.analyticbridge.datasciencecentral.com/profile/VincentGranville

<p>This article is intended for practitioners who might not necessarily be statisticians or statistically-savvy. The mathematical level is kept as simple as possible, yet I present an original, simple approach to test for randomness, with an interesting application to illustrate the methodology. This material is not something usually discussed in textbooks or classrooms (even for statistical students), offering a fresh perspective, and out-of-the-box tools that are useful in many contexts, as…</p>


<p>This article is intended for practitioners who might not necessarily be statisticians or statistically-savvy. The mathematical level is kept as simple as possible, yet I present an original, simple approach to test for randomness, with an interesting application to illustrate the methodology. This material is not something usually discussed in textbooks or classrooms (even for statistical students), offering a fresh perspective, and out-of-the-box tools that are useful in many contexts, as an addition or alternative to traditional tests that are widely used. This article is written as a tutorial, but it also features an interesting research result in the last section. The example used in this tutorial shows how<span> intuiting can be wrong, and why you need data science.</span></p>
<p>The main question that we want to answer is: Are some events occurring randomly, or is there a mechanism making the events not occurring randomly? What is the gap distribution between two successive events of the same type? In a time-continuous setting (Poisson process) the distribution in question is modeled by the exponential distribution. In the discrete case investigated here, the discrete Poisson process turns out to be a Markov chain, and we are dealing with geometric, rather than exponential distributions. Let us illustrate this with an example.</p>
<p><strong>Example</strong></p>
<p>The digits of the square root of two (SQRT(2)), are believed to be distributed as if they were occurring randomly. Each of the 10 digits 0, 1, … , 9 appears with a frequency of 10% based on observations, and at any position in the decimal expansion of SQRT(2), on average the next digit does not seem to depend on the value of the previous digit (in short, its value is unpredictable.)  An event in this context is defined, for example, as a digit being equal to (say) 3. The next event is the first time when we find a subsequent digit also equal to 3. The<span> </span><em>gap</em><span> </span>(or time elapsed) between two occurrences of the same digit is the main metric that we are interested in, and it is denoted as<span> </span><em>G</em>. If the digits were distributed just like random numbers, the distribution of the gap<span> </span><em>G</em> between two occurrences of the same digit, would be geometric</p>
<p>Do you see any pattern in the digits below? <span style=”text-decoration: underline;”><a href=”https://www.datasciencecentral.com/profiles/blogs/stochastic-processes-new-tests-for-randomness-application-to-numb” target=”_blank” rel=”noopener”><strong>Read full article here</strong></a></span> to find the answer, and to learn more about a powerful statistical technique.</p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220290758?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220290758?profile=original” width=”528″ class=”align-center”/></a></span></p>
<p><span style=”font-size: 14pt;”><b>DSC Resources</b></span></p>
<ul>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/check-out-our-dsc-newsletter”>Subscribe to our Newsletter</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/comprehensive-repository-of-data-science-and-ml-resources”>Comprehensive Repository of Data Science and ML Resources</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/advanced-machine-learning-with-basic-excel”>Advanced Machine Learning with Basic Excel</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/difference-between-machine-learning-data-science-ai-deep-learning”>Difference between ML, Data Science, AI, Deep Learning, and Statistics</a></li>
<li><a href=”https://www.datasciencecentral.com/profiles/blogs/my-data-science-machine-learning-and-related-articles”>Selected Business Analytics, Data Science and ML articles</a></li>
<li><a href=”http://careers.analytictalent.com/jobs/products”>Hire a Data Scientist</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/page/search?q=Python”>Search DSC</a><span> </span>|<span> </span><a href=”http://classifieds.datasciencecentral.com”>Classifieds</a><span> </span>|<span> </span><a href=”http://www.analytictalent.com”>Find a Job</a></li>
<li><a href=”http://www.datasciencecentral.com/profiles/blog/new”>Post a Blog</a><span> </span>|<span> </span><a href=”http://www.datasciencecentral.com/forum/topic/new”>Forum Questions</a></li>
</ul>



What is an Analytics Translator and Why is the Role Important to Your Organization? tag:www.analyticbridge.datasciencecentral.com,2018-02-23:2004291:BlogPost:381249
2018-02-23T09:30:00.000Z


Kartik Patel
https://www.analyticbridge.datasciencecentral.com/profile/KartikPatel

<p><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220284274?profile=original” width=”700″></img></p>
<p>Today, enterprises recognize the critical value of advanced analytics within the organization and they are implementing data democratization initiatives. As these initiatives evolve, new roles emerge in the organization. The newest of these analysis-related roles is the<span> </span><strong>’analytics translator'</strong>. As the enterprise considers the relevance of this new role within the business, it is important to understand the responsibilities of an Analytics…</p>


<p><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220284274?profile=original” width=”700″/></p>
<p>Today, enterprises recognize the critical value of advanced analytics within the organization and they are implementing data democratization initiatives. As these initiatives evolve, new roles emerge in the organization. The newest of these analysis-related roles is the<span> </span><strong>’analytics translator'</strong>. As the enterprise considers the relevance of this new role within the business, it is important to understand the responsibilities of an Analytics Translator, and how this role might help the organization to achieve its goals. </p>
<p><strong><em>What is an Analytics Translator?</em></strong></p>
<p>The Analytics Translator is an important member of the new analytical team. As organizations encourage data democratization and implement self-serve business intelligence and advanced analytics, business users can leverage machine learning, self-serve data preparation, and predictive analytics for business users to gather, prepare an analyze data. The emerging role of Analytics Translator adds resources to a team that includes IT, data scientists, data architects and others.</p>
<p>Analytics Translators do not have to be analytical specialists or trained professionals. With the right tools, they can easily translate data and analysis without the skills of a highly trained data pro.</p>
<blockquote>Using their knowledge of the business and their area of expertise, translators can help the management team focus on targeted areas like production, distribution, pricing and even cross-functional initiatives.</blockquote>
<p>With self-serve, advanced analytics tools, translators can then identify patterns, trends and opportunities, and problems. This information is then handed off to data scientists and professionals to further clarify and produce crucial reports and data with which management teams can make strategic and operational decisions.</p>
<p><strong><em>Why is an Analytics Translator Important to Your Organization?</em></strong></p>
<p>IT resources and data professionals are typically in short supply within an organization and, if the enterprise wishes to increase staff, the cost of these highly skilled professionals can be prohibitive. In the average organization, these resources are usually stretched thin and time is wasted on projects that are:</p>
<ul>
<li>Too complex for business team members</li>
<li>Conceived or inappropriate for attention at the data scientist or IT level</li>
<li>Comprised of incomplete requirements</li>
<li>Required for day-to-day or immediate analysis or data sharing initiatives</li>
<li>Tactical or low-level operational in nature</li>
</ul>
<p>The time it takes for a data professional or IT professional to review the project and assign a priority, will take them away from more strategic or more critical tasks and, in the process, the business user may miss day-to-day deadlines or information that is critical to them. Perhaps, the data professional may need more information on requirements, which will further delay the project. There are many examples of unnecessary or inappropriate data analysis requests and many instances where a business user with access to analytical tools might be able to do the work themselves. But, there are even more examples of projects or analytical requirements that fall somewhere between the skills of a business user and the skills of a trained data scientist and just as many examples of poorly understood or poorly translated data analysis that sends a business user off in the wrong direction.</p>
<p>That is where the Analytics Translator comes in. Using her or his knowledge of the industry, the organization, the team and the analytics tools, the translator can play a crucial role in understanding requirements, preparing data and producing and explaining information in a way that is accurate and clear. As this role evolves within your organization, you will find that, by allowing the average business user to work with the Analytics Translator, that business user will become more knowledgeable and skilled in interpreting and understanding data.</p>
<p><strong><em>The Ideal Analytics Translator</em></strong></p>
<p>When identifying possible candidates to perform the Analytics Translator role, the organization should look for skills that can be nurtured and optimized as an asset.</p>
<ul>
<li>A power user of self-serve BI tools</li>
<li>Recognized as an expert in a functional, industry or organizational role</li>
<li>Comfortable with building and presenting reports and use cases</li>
<li>Works well with technical and management teams</li>
<li>Manages projects, milestones and dependencies with ease</li>
<li>Able to translate analysis and conclusions into actionable recommendations</li>
<li>Comfortable with metrics, measurements and prioritization</li>
<li>Acts as a role model for user and team member adoption of new processes and data-driven decisions</li>
</ul>
<p>If this role is recognized as important to the organization, most enterprises will structure a logical program to identify and train candidates to ensure uniform skills and performance.</p>
<p>By combining domain, organizational and industry skills with self-serve analytical tools, the Analytics Translator can help the enterprise to achieve low total cost of ownership (TCO) and rapid return on investment (ROI) for its business intelligence and advanced analytics initiatives and can encourage and nurture data democratization and optimal analytical business results within the organization.</p>
<blockquote><strong>Citizen Data Scientists/Citizen Analysts<span> </span></strong>play a crucial role in day-to-day analysis and decision-making, using self-serve business intelligence tools.<span> </span><strong>Analytics Translators </strong>bridge the gap between IT, data scientists and business users, and move initiatives forward by acting as a liaison and topic expert to help the organization focus on the right things to achieve its goals.</blockquote>
<p></p>
<p>As self-serve Advanced Analytics and data democratization becomes more common across industries and organizations, the role of the Analytics Translator will also become more important. As a power-user of BI tools and Self-Serve Analytics, the translator functions as a liaison between critical analytical and technical resources and the business user community and ensures that BI tools will be adopted and shared across the enterprise.</p>
<p>In our next article, we will consider the difference between the Analytics Translator and the Citizen Data Scientist or Citizen Analyst.</p>



What is Clickless Analysis? Can it Simplify Adoption of Augmented Analytics? (Part 1 of 3 articles) tag:www.analyticbridge.datasciencecentral.com,2018-01-25:2004291:BlogPost:380540
2018-01-25T12:30:00.000Z


Kartik Patel
https://www.analyticbridge.datasciencecentral.com/profile/KartikPatel

<p><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282060?profile=original” width=”700″></img></p>
<p>The concept of Clickless Analytics is one that will be happily embraced by business users and by the business enterprise. The reason is simple! Clickless Analytics allows users to find and analyze information without specialized skills, by using natural language.</p>
<p>In this, the first of a three-part series we discuss Clickless Analytics and how it can simplify user adoption of augmented analytics.</p>
<p>What is Clickless Analytics?</p>
<p>Clickless Analytics…</p>


<p><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282060?profile=original” width=”700″/></p>
<p>The concept of Clickless Analytics is one that will be happily embraced by business users and by the business enterprise. The reason is simple! Clickless Analytics allows users to find and analyze information without specialized skills, by using natural language.</p>
<p>In this, the first of a three-part series we discuss Clickless Analytics and how it can simplify user adoption of augmented analytics.</p>
<p>What is Clickless Analytics?</p>
<p>Clickless Analytics incorporates Natural Language Processing (NLP) and takes augmented analytics to the next level with machine learning and NLP in a self-serve environment that is easy enough for every business user. Business users can leverage sophisticated business intelligence tools to perform advanced data discovery by asking questions using natural language. The system will translate that search analytics language query into a query that the analytics platform can interpret, and return the most appropriate answer in an appropriate form such as visualization, tables, numbers or descriptions in simple human language. Clickless Analytics interprets natural language queries and presents results through smart visualization and contextual information delivered in natural language.</p>
<p><em><strong>Can Clickless Analytics Simplify Adoption of Augmented Analytics?</strong></em></p>
<p>Clickless analytics, NLP and search analytics provides true data democratization of advanced analytics. Clickless Analytics incorporates NLP within a suite of Augmented Analytics  features, leveraging computational linguistics, data mining, and analytical algorithms to provide a self-serve, natural language approach to data analysis. Search Analytics and NLP filters through mountains of data to answer a question in the way a user can understand, thereby simplifying and speeding the decision process and ensuring clarity.</p>
<p>Clickless Analytics suggests relationships and offers insight to previously hidden information so that business users can ‘discover’ subtle, crucial business results, patterns, problems and opportunities. Clickless Analytics provides maximum results and business user access with minimum implementation time and minimal training.</p>
<p>Clickless Analytics and an NLP approach to augmented analytics utilizes a Google-type interface where business users can enter a question in human language, i.e., ‘what is our best selling product in Arizona’ or ‘who is the best performing salesperson in this year as compared to previous year.’ The ease-of-use assures user adoption and the clarity of analysis and reporting achieved by the enterprise results in an environment where the team, managers and executives can achieve rapid, accurate results without the assistance of IT or business analysts.</p>
<p>The evolution of search analytics and the application of NLP search within the confines of a business intelligence solution have allowed the average organization to leap forward with advanced data discovery and the incorporation of these crucial tools into a self-serve environment for user empowerment and accountability. Clickless Analytics and NLP help businesses to achieve rapid ROI and sustain low total cost of ownership (TCO) with meaningful tools that are easy to understand, and as familiar as a Google search. These tools require very little training, and provide interactive tools that ‘speak the language’ of the user.</p>
<p>Clickless Analytics, NLP and Search Analytics are a crucial component of business intelligence and Augmented Analytics, and are essential to business success and to building and sustaining a competitive advantage.</p>
<p><strong>Watch for Part II and Part III of this article series:</strong><span> </span>’What is Search Analytics and Can it Improve Self-Serve Data Discovery?’ and ‘What is Natural Language Processing &amp; How Does it Benefit a Business?'</p>



Easy Dashboards for Everyone Using Google Data Studio tag:www.analyticbridge.datasciencecentral.com,2018-01-11:2004291:BlogPost:380100
2018-01-11T23:30:00.000Z


Laura Ellis
https://www.analyticbridge.datasciencecentral.com/profile/LauraEllis

<div class=”sqs-block html-block sqs-block-html” id=”block-yui_3_17_2_4_1502412116261_16863″><div class=”sqs-block-content”><p>No matter the job, most professionals do some level of analysis on their computer.  There are always some data sets that live outside the walls.  Or, some analyses that we know could be performed better in a not-easily-sharable tool such as excel, R, python, SPSS, SAS and so on.</p>
<p>So how do you share your personal analysis with others?  Often times people export…</p>
</div>
</div>


<div class=”sqs-block html-block sqs-block-html” id=”block-yui_3_17_2_4_1502412116261_16863″><div class=”sqs-block-content”><p>No matter the job, most professionals do some level of analysis on their computer.  There are always some data sets that live outside the walls.  Or, some analyses that we know could be performed better in a not-easily-sharable tool such as excel, R, python, SPSS, SAS and so on.</p>
<p>So how do you share your personal analysis with others?  Often times people export the graphs and tables to add into a presentation file.  One of the largest downfalls to this approach is that it can cause versioning and updating nightmares.  </p>
<p>What if I told you that we could avoid all of this with dashboards?  Some of you may say, “Yes, obviously, Laura.  But I don’t have a licensed BI tool or BI experts at my disposal!  It’s not a realistic scenario for me.”  Now in the past, I might’ve agreed with you.  If you don’t have a paid BI tool, it can be tricky.  Free BI tool versions usually require the owner to host the software, or they limit the number of charts, viewers or users using the tool.</p>
<p>However, earlier this year, Google removed a number of restrictions to their free hosted dashboarding software called Google Data Studio.  Because of this, I decided to give the software a test drive and see how accessible it is to the non-BI expert.</p>
<p>Below I will take you through a tutorial that I wrote which should allow anyone to create a Google Data Studio dashboard about US Home Prices.  It should take about 1/2 an hour of your time.  It really is that easy.  So please, have a try and let me know how it goes!</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220282810?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220282810?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></p>
<div class=”sqs-block html-block sqs-block-html” id=”block-dd70af8bc85339bd772f”><div class=”sqs-block-content”><p></p>
<p><strong>The Tutorial Description</strong></p>
<p></p>
<p>For this tutorial I wanted to use some sample data to make a basic one page dashboard.  It will feature some common dashboard elements such as: text, images, summary metrics, summary tables and maps.  To do so I searched out free data sets and found out that Zillow offers summary data collected through their real estate business. </p>
<blockquote><em>Side note: Thank you Zillow, I love when companies share their data! </em></blockquote>
<p>I downloaded a number of the data sets that I thought would be interesting to display and did a little data processing to make dashboard creation easier.  From there I set out to make a dashboard without reading any instructions to see how usable it really is.  I have to say, it was easy!  There are some odd beta style behaviors that I outline below, but all in all it is a great solution. </p>
<p><strong>The Tutorial Steps</strong></p>
<p><strong>1. <span> </span></strong>Download the sample<span> </span><a target=”_blank” href=”https://github.com/lgellis/ZillowSampleData” rel=”noopener”>data set</a><span> </span>needed to create the sample.  </p>
<p>Note: if you have trouble downloading the file from github, go to the main page and select “Clone or Download” and then “Download Zip” as per the picture below.</p>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220297255?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220297255?profile=RESIZE_1024x1024″ width=”570″ class=”align-full” height=”194″/></a></p>
<p>2.  Sign up for Google Data Studio</p>
<p>3.  Click “Start a New Report”</p>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220302505?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220302505?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></p>
</div>
</div>
<div class=”sqs-block image-block sqs-block-image sqs-text-ready” id=”block-yui_3_17_2_4_1503272149372_26958″><div class=”sqs-block-content” id=”yui_3_17_2_1_1515713017393_95″><div class=”image-block-outer-wrapper layout-caption-below design-layout-inline” id=”yui_3_17_2_1_1515713017393_94″><div class=”intrinsic” id=”yui_3_17_2_1_1515713017393_93″></div>
</div>
</div>
</div>
<p><span>4. In the new report, add the file “</span><a href=”https://github.com/lgellis/ZillowSampleData/blob/master/Zillow_Summary_Data_2017-06.csv”>Zillow_Summary_Data_2017-06.csv</a><span>” downloaded as part of the zip file from the data set in step 1.</span></p>
<p></p>
<p></p>
<p></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220303246?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220303246?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></span></p>
<p><span>5.  Modify the columns of the data set to ensure that “State” is of type “Geo”&gt;”Region” with no aggregation and the remaining columns are type “Numeric” &gt;”Number” with “Average” as the aggregation.</span></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220307925?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220307925?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></span></p>
<p>6.  Click “Add to Report”.  This will make the data source accessible to your new report.</p>
<p>Now we are ready to start building the report piece by piece.  To make it easier, I have broken up the dashboard content into 5 pieces that can be added.  We will tackle these one by one.</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220308037?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220308037?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></p>
<p><span>To add each of the components above, you will need to use the Google Data Studio Toolbar on the top navigation.  The image below highlights each of the toolbar items that we will be using.</span></p>
<p></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220308209?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220308209?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></span></p>
<p>7.  “A. Text”- Easy street. Let’s add some text to the dashboard. Start by clicking the “Text” button highlighted in the toolbar above.  Next, take the cross-hair and drag it over the space you want the text to occupy. Enter your text: “US Home Prices”.  In the “Text Properties” select the size and type.  I’m using size 72 and type “Roboto Condensed”.</p>
<p>8. “B. Image”- Easy street part 2.  Now we are simply a pretty picture to the dashboard.  Start by clicking the “Image” button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the image to occupy.  Select the image “houseimage.jpg” that you downloaded from the GH repo.</p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220309689?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220309689?profile=original” width=”295″ class=”align-full” height=”336″/></a></p>
<p></p>
<p>9. “C. Scorecard Values”- Now we get into the real dashboarding exercises through metrics and calculations.  Start by clicking the “Scorecard” button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the first scorecard value to occupy. In the “data” tab, Select the data set and appropriate metric.  Start with the values in the image to the left.  In the “style” tab select size 36 with the type “Roboto”.</p>
<p>Repeat this for every metric in the “C. Scorecard Values” section.</p>
<p></p>
<p><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220309941?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220309941?profile=original” width=”296″ class=”align-full” height=”357″/></a></p>
<p></p>
<p><span>10. “D. Map” – In this step we get more impressive, but not more difficult.  We implement a map! Start by clicking the “Geo Map” button highlighted in the toolbar above. Take the cross-hair and drag it over the space you want the map to occupy.  Select the data set and appropriate metric as per the values in the image to the left.</span></p>
<p></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220310346?profile=original” target=”_self”><img src=”http://storage.ning.com/topology/rest/1.0/file/get/2220310346?profile=original” width=”178″ class=”align-full” height=”407″/></a></span></p>
<p><span>11. “E. List”- Now we are going to list out all values in the Geo Map above ordered by their metric “Average Home Value”.  Start by clicking the “Table” button highlighted in the toolbar above.  Take the cross-hair and drag it over the space you want the list to occupy. Select the data set and appropriate metric as per the values in the image to the left.</span></p>
<p></p>
<p><span><a href=”http://storage.ning.com/topology/rest/1.0/file/get/2220310588?profile=original” target=”_self”><img width=”750″ src=”http://storage.ning.com/topology/rest/1.0/file/get/2220310588?profile=RESIZE_1024x1024″ width=”750″ class=”align-full”/></a></span></p>
<p><span>12.  Make the Report External and Share.  Click the person + icon in the top right of your screen.  Select “Anyone with the link can view”.  Copy the external URL and click done.  Now take that external URL and send to all your friends and family with the subject “Prepare to be amazed”.</span></p>
<p></p>
<p><span>13.  Optional Embedding: Google Data Studio has now released the feature to allow for embedding the reports or dashboards into a web page.  </span></p>
<p></p>
<p>And there you have it, your dashboard is created and you can share away!</p>
<p></p>
<p><strong>Some Criticisms</strong></p>
<p>As I’m sure was obvious from above, I’m impressed with their offering.  But I do feel it is my duty to outline some oddities I came across.  For example: when you set up your data source, you need to specify ahead of time for each column what type of summary you plan on doing with that value.  If you want to use a chart to display averages, you cannot select this within the chart dynamically, it has to be at the data source.  I find this odd and limiting.  Additionally, the csv import has a 200 column limit and there are some formatting annoyances.  </p>
<p></p>
<p><strong>Final Note</strong></p>
<p>I’m happy that I tried out Google Dash Studio.  While it does not meet my current needs at the enterprise level, I am very impressed at it’s applicability and accessibility to the personal user.  I truly believe that anyone could make a dashboard with this tool.  So give it a try, impress your colleagues and mobilize your analysis with Google Dash Studio!</p>
<p>Written by Laura Ellis</p>
<p></p>
<p>Original Blog Post <a href=”https://www.littlemissdata.com/blog/dashboards-for-everyone” target=”_blank” rel=”noopener”>Here</a></p>
<p></p>
</div>
</div>


Published at

Be the first to comment

Leave a Reply

Your email address will not be published.


*