The Dallas Data Science Conference (2018) was amazing! Overall, there were a lot of incredible ideas discussed, and I met several great people.
In this article, I'll discuss the highlights of the experience as well some of the cutting edge ideas that were shared.
The conference was hosted by IDEAS (the International Data Engineering And Science Association) at the University of Texas at Dallas (Richardson, TX) on 2/10/2018.
I arrived a little early and had the opportunity to meet one of the main organizers, Randy Lao, who did an outstanding job.
I grabbed a seat and was ready to learn.
Speakers and Topics
The speakers at the conference this year were phenomenal! The topics were interesting, and the presentations were excellent. The image below captures the topics that were very popular this year:
The conference hosts did a magnificent job! All three of them were very engaging and captivating.
Two of the conference hosts this year included the very entertaining and enthusiastic married duo, "Coach" Culbertson and Kimberly Culbertson. I had the opportunity to chat with them a little bit at the conference after party. I didn't realize that conference hosting as a service even existed. They offer this as a service and do an excellent job! If you're interested, you can check them out here.
Another conference host was Mike C. Matthews. He also did an wonderful job and kept everyone engaged. I got to chat with Mike at the after party, and he shared some beneficial insights on entrepreneurship.
After Party Networking Event
After the conference, there was an after party at the British Beverage Company in Uptown (Dallas, TX), which was a lot of fun.
There was a huge turnout and several of the speakers at the conference were able to make it too, which led to lots of insightful conversations. The event hosts were also there, making it even more enjoyable.
I met a lot of great people at the after party. It was an enjoyable way to debrief some of the ideas discussed and end the day on a high note.
Key Conference Takeaways
There were so many innovative ideas shared at the conference. I took a ton of notes and wanted to share some of the key takeaways. As a side note, there were several other distinguished speakers at the conference that aren't listed below.
1. Jupyter Notebook + Google Colab: Tarek Hoteit
The first session I attended was by Tarek Hoteit. He did a excellent job and infused some fantastic story telling throughout his presentation, which really made it engaging and interesting. You can check out his slide deck here.
One key takeaway from Tarek's presentation was the ability to use Jupyter Notebooks with Google Colab. I've had a lot of experience using Jupyter notebooks with Python, but I was unfamiliar with Google Colab. It's basically a free platform built in a Jupyter notebook environment and stored in the cloud. Colab integrates with Google Drive, making it a significant game changer for shared Python development.
2. Process Mining with Machine Learning & Artificial Intelligence: Viswanath Puttagunta
Viswanath Puttagunta did an exceptional job with his presentation. It was a perfect balance of the big picture for those unfamiliar with the topic, as well as technical details to give practitioners some tips to try out.
Process optimization is near and dear to my heart, so I really enjoyed this session. For businesses, process drives everything. It's absolutely critical that an organization's processes are efficient, effective, and optimized. In my opinion, achieving this is one of the biggest challenges for businesses today, especially with the amount of data complexity involved.
A useful takeaway from Vish's presentation was that you can use machine learning and artificial intelligence to model and improve processes. Vish is the Chief Technology Officer at Divergence AI, a company that specializes in this capability.
Example: Vish walked us through some real world examples of this approach in action, including Walmart's need to prevent late shipments from suppliers.
Walmart has a significant need for both supply prediction and demand prediction. How else can Walmart ensure that products are efficiently sourced from suppliers and delivered based on constant fluctuations in demand? As Vish pointed out, effective process mining makes it possible to optimize these processes and make them more predictable.
Bottom Line: With the level of complexity and the amount of data involved in processes today, machine learning (ML) and artificial intelligence (AI) are powerful tools that can help. Vish demonstrated what it looks like to use ML and AI for process discovery. In addition, he showed us what metrics we should be looking for in terms of fitness, simplicity, precision, and generalization.
Vish was also at the conference after party, and I got the opportunity to talk to him about Divergence AI and the types of problems they're solving for clients.
3. How to Scale Analytics: Sergey Maydanov
Of all of the presentations I saw, Sergey Maydanov's was the most technically deep. Sergey didn't hold back any punches during his session, and the entire presentation was full of incredible insights into the scalability of analytics.
Sergey works at Intel as a software team lead. He mentioned that one of the challenges that Intel is currently working on is how to achieve scalability of analytics for very large data sets. Sergey also said that Intel is developing CPU's that can scale on big data problems.
One imperative takeaway was that both software and hardware efficiency are absolutely critical for large scale analytics. Sergey said that Intel is working hard on both the software and the hardware aspects of this problem.
He hinted that on the hardware side, Intel is going after a multi-threading and multi-node design approach. This requires chips with more cores in order to handle big data problems that continue to scale in size.
On the software side, Intel is also working on libraries that are optimized for their new chip designs. Check out the video below about Intel DAAL and Intel MKL to learn more about these libraries.
Another interesting takeaway from this session was that data transfer between devices or locations is becoming very costly, especially when dealing with very large data sets. In order to decrease this cost, Sergey recommended that as much analysis as possible should occur close to the data source. He also discussed multiple strategies for how this can be achieved.
One challenge today for data scientists and data engineers is how to streamline data cleaning. A significant amount of time is required for data cleansing, and automation can help. Therefore, being able to scale this automation is critical.
Toward the end of his presentation, Sergey showed us some data on the speed increases that Intel has achieved on analytics processing. The performance data he shared was from an Intel distribution for Python scikit-learn (a machine learning library), deployed on a Google Cloud Platform using 96 vCPU Intel Xeon processors. The result was that it was over 20 times faster than the benchmark!
Sergey talks more about how Intel is helping to take Python to the next level in the video below:
4. Today's Big Data Challenges: Jerry Watson
After lunch, Jerry Watson hosted an hour long session on the challenges we face today with big data and artificial intelligence. Jerry is an incredible speaker! He has an energetic presence that made his presentation very captivating.
One insightful takeaway from Jerry's presentation was that scalability is one of the biggest barriers right now in big data and artificial intelligence. Jerry also pointed out the need for real time processing that can empower actionable data for faster and better decision making. These points built onto Sergey's presentation from earlier in the day (see above).
In addition, Jerry discussed other issues that we face for scaling business intelligence. These challenges included "data silos", integrating third-party data effectively, keeping up with streaming data, and managing large data sets that are rapidly growing.
Jerry then followed up his presentation with a Q&A session with experts Padmanand Warrier and Sadu Hegde. One of the topics discussed was the difference between a data lake and a data warehouse.
5. Using Blockchain for the Supply Chain: Vipul Tiwari
Vipul Tiwari presented on how blockchain technology can be used to improve the integrity of supply chains. His passion for blockchain radiated throughout his presentation and energized the audience. Vipul definitely earned the enthusiastic applause at the end of his presentation.
The biggest takeaway from Vipul's presentation was how blockchain technology is going to be a game changer over the next decade. Most people are familiar with blockchain only through cryptocurrencies like Bitcoin. However, Vipul pointed out that blockchain will change almost everything in our lives much like the internet.
An example that Vipul walked us through was Walmart teaming up with IBM to use blockchain technology to rapidly detect food contamination, which carries a significant cost each year. In essence, blockchain can enable end to end transparency throughout the entire supply chain and greatly reduce the amount of time it takes to track contamination or other issues.
For example, tracing the origin of food from a contamination can take 3-4 weeks using existing industry practices. However, using blockchain technology could reduce the time required for that same trace to mere seconds. Imagine the impact this could have on other supply chains.
In addition, Vipul pointed out some incredible uses for blockchain within supply chains. These included tracking medicines that have stringent storage and shipment requirements, catching counterfeits, tracking the source locations of diamonds (conflict-free), to name a few. Moreover, payments to suppliers can be linked to certain requirements being met before payment is even made.
In essence, the benefits of using blockchain technology within supply chains can increase trust, create greater efficiency, and eliminate the need for expensive, central accounting systems and auditing.
6. Blockchain + Machine Learning = Greater Trust: Mark Lynd
Right after Vipul's session, Mark Lynd dazzled the crowd with an incredible presentation. There was a lot of enthusiastic energy in the room after Mark finished his presentation as everyone was drinking the blockchain "Kool-aid". Vipul's presentation right before got everyone excited about blockchain, and Mark's session sealed the deal.
Mark is a world-class speaker that will keep you on the edge of your seat. I really enjoyed his presentation.
Some incredible takeaways from Mark's presentation:
- 1Blockchain technology is very robust. Only quantum computing is a threat to blockchain at this point in time.
- 2Blockchain operations are starting to include large data sets. There's a significant need for machine learning and big data techniques to be integrated with blockchain to deal with this problem.
- 3Machine learning is the logical next phase for blockchain including the use of deep learning and neural networks (AI).
- 4Mark pointed out that large American corporations still have significant issues with data integrity even though most companies won't admit it. I can attest to this! It has been one of the biggest challenges that I've faced as a data engineer.
- 5One impactful goal of blockchain is to enable decentralized intelligence. With blockchain, if one node gets taken out, the intelligence of the system can still survive.
- 6Mark also told us about the Hade Platform, which is an Ethereum based cryptocurrency used to access news and financial analysis information. He said they're taking on big players in the industry and have developed disruptive machine learning capabilities that generate insights about companies. Consequently, they can build investment funds with documentation on blockchain with dynamic, smart contracts.
- 7Another example Mark told us about was Boon Tech, a freelance platform that is using AI and blockchain technology to minimize service fees. He mentioned that they're using machine learning to match talent with clients and artificial intelligence to protect user identities. Check out the short video about Boon Tech below:
Overall, the 2018 Dallas Data Conference was a great experience! I'm so glad that I had the opportunity to attend, and I'm really looking forward to the next IDEAS conference.
A big thanks to IDEAS for making it possible, as well as to Randy Lao, and the conference hosts: "Coach" Culbertson, Kimberly Culbertson, and Mike C. Matthews.
Please feel free to leave comments below! Also, if you liked this article, sign up to our mail list so you can get more content like this (click the button below). We also love social shares (buttons below)!
Share this article:
About the Author: Pete Thompson
Hi, I'm Pete Thompson, the founder of www.DataIsBeauty.com. Our vision for 'Data is Beauty' is to be an incredible resource for anyone interested in data concepts.
Please connect with me via the links below:
Thank you very much for sharing your experience here! I find this article really useful and serves as a great refresher of the major take away’s from the conference.Reply